An Efficient Breast Cancer Segmentation System based on Deep Learning Techniques

Breast cancer is one of the major threats that attack women around the world. Its detection and diagnosis in the early stages can greatly improve care efficiency and reduce mortality rate. Early detection of breast cancer allows medical professionals to use less intrusive treatments, such as lumpectomies or targeted medicines, improving survival rates and lowering morbidity. This study developed a breast cancer segmentation system based on an improved version of the U-Net 3+ neural network. Various optimizations were applied to this architecture to improve the localization and segmentation performance. An evaluation of different state-of-the-art networks was performed to improve the performance of the proposed breast cancer diagnosis system. Various experiments were carried out on the INbreast Full-Field Digital Mammographic dataset (INbreast FFDM). The results obtained demonstrated that the proposed model achieved a dice score of 98.47%, which is a new state-of-the-art segmentation finding, showcasing its efficiency in detecting breast cancer from mammography images with the possibility of implementation for real applications.


INTRODUCTION
Breast cancer is the most common type of cancer among women.Early detection of this malignancy can significantly improve survival rates, while also costing less to cure.Advances in 3D mammography, computed tomography, CT scans, histopathological imaging, and Magnetic Resonance Imaging (MRI) are just examples of numerous breakthroughs in radiographic imaging.Breast cancer can be detected early from radiologists and pathologists using any of these imaging types.This method is not only costly, but also has a high mistake rate.According to the 2020 Cancer Report from the International Agency for Research on Cancer (IARC), cancer is the leading or secondary cause of death.The target age range is 30 to 69 years.Lung cancer is the first cause of cancer death in both men and women.Globally, breast and prostate cancer are among the leading causes of death for women and men, respectively.According to the IARC, breast cancer affected more than 2 million women and caused 685,000 deaths worldwide in 2020.Cancer presents an abnormal division of cells that leads to the appearance of a mass called a tumor.There are two categories of tumors: malignant (cancerous) and benign (not cancerous).Breast cancer thus manifests itself as a tumor that forms in breast cells.Through Positron Emission Tomography (PET), CT, ultrasound, histology, MRI, thermography, and mammography, accurate and early diagnosis of breast cancer can significantly improve patients' quality of life.Among these different imaging modalities, mammogram images are the best choice for diagnosing breast cancer due to their reliability and cost-effectiveness.Manual analysis of mammogram images presents various disadvantages, as it is very costly and time-consuming and can lead to different misdiagnoses and high false positive rates.Mammography is the most popular technique to diagnose breast cancer in women that do not have symptoms of the disease.Mammograms are shown as low-energy breast X-ray images.The use of mammography has led to significant advancements in the detection of micro-calcifications and calcification clusters, owing to its high sensitivity to these features [1].It has been shown that mammography screening reduces the death rate in general populations [2].Mammogram images can be used for routine screening, as they have proven to be technically more suitable for screening [3].
During the last few years, deep learning-based architectures have shown great success in various fields, including indoor object detection [4], indoor wayfinding assistance navigation [5], face recognition [6], pedestrian detection [7], traffic sign detection [8] and medical image processing [9].Deep learning techniques have been widely applied to new early diagnosis systems for breast cancer.By building such a system, a huge number of patient lives can be saved.It is important to closely observe the incidence of breast cancer-related deaths and their subsequent reduction after effective treatment.Breast cancer research has seen significant advances during the last decade [10][11].Several non-invasive diagnostic and therapeutic tools are used to see into the human body .Any of the imaging techniques can detect breast cancer in its early stages, but they cannot demonstrate malignancy on their own [14].Cancer cells tend to accumulate in interstitial tissue veins or fluid, and their malignancy is usually detected by microscopic examination of cancer tissues [15].Biopsy procedures, such as surgical incisions or needle paths, may result in an acceleration of malignancy spread by dragging cancer cells along [16].In mammography for breast cancer diagnosis, the pectoral muscle is often removed during preprocessing to improve detection rates [17].Mammography is limited to the detection of anomalies within the breast region, achieved by the exclusion of the pectoral muscle and surrounding background regions.
This study aims to investigate a novel method for the detection and diagnosis of breast cancer.An enhanced version of the U-Net 3+ [18] neural network served as the foundation for the proposed system.Training and testing trials were conducted on the INbreast FFDM dataset [19], showing new and cutting-edge results and performance.The motivation behind using a modified U-Net 3+ architecture for medical image segmentation lies in its ability to address specific challenges and requirements in the field of medical imaging.U-Net 3+ is an extension of the original U-Net architecture, which was designed for semantic segmentation tasks in various domains, including medical imaging.Medical images often come in high resolutions and may contain fine details that are crucial for accurate diagnosis and treatment planning.The U-Net3+ architecture is well-suited for multiscale segmentation tasks, being able to capture both global and local features in images.This is important in order to identify anatomical structures and abnormalities with varying sizes.Medical image datasets are typically smaller and more challenging to acquire compared to other image domains.The U-Net 3+ architecture is known for its effectiveness in learning from limited data, making it a suitable choice for medical applications where large datasets are often not available due to privacy and ethical constraints.In medical image segmentation, the distribution of foreground (abnormality) and background (normal tissue) pixels is often highly imbalanced.U-Net3+, when properly configured, can handle class imbalance issues and help to produce accurate segmentation maps by focusing on the areas of interest.In general, the choice to use a modified U-Net3+ architecture for medical image segmentation was motivated by its ability to address the unique challenges and requirements of the medical imaging domain, ultimately helping to obtain more accurate and clinically relevant segmentation results.The main contributions of this study are:  Proposing a breast cancer detection system based on mammography image segmentation.
 Proposing an image segmentation network based on modifying the U-Net3+ model to extract more relevant semantic features.
 Evaluation of the proposed network on a realistic dataset and achieve high performance.

II. RELATED WORKS
Segmentation of breast cancer is essential to save lives by facilitating early detection, accurate diagnosis, treatment planning, disease monitoring, surgical guidance, and supporting research and development in breast cancer care.Early breast cancer diagnosis has been investigated in many studies.Mammography images are widely used in modern medical procedures to identify breast cancer [20].In [21], Conditional Random Field (CRF) and Structured Support Vector Machine (SSVM) were introduced to classify mass mammography, using deep convolution and prospective operations based on belief networks.However, it was determined that the SSVM model exhibited lower performance compared to the CRF model in terms of training and inference durations.In [22], a Full-resolution Convolutional Network (FrCN) was proposed, using X-ray mammograms from the INbreast dataset [19] and four-fold cross-validation.The Matthews Correlation Coefficient (MCC) for the FrCN model was 98.96%, the F1-score was 99.24%, and it had the high accuracy of 95.66%.In [23], the BDR-CNN-GCN was proposed, which combined a CNN with 8 layers, including dropout and batch normalization layers with a Graph Convolutional Network (GCN).This model was evaluated in the MIAS dataset.By combining the two-layer GCN with the CNN in the final model, the system was able to attain an accuracy of 96.10%.
In [24], the YOLOv5 network was adapted to recognize and classify breast cancers by testing various parameter values.Faster RCNN and YOLOv3 were outperformed by the upgraded YOLOv5 model, which had an accuracy of 96.50% and an MCC of 93.50%.In [25], the IRMA mammography dataset was used to evaluate a model for categorizing mammograms as normal or abnormal based on a variety of variables.In [26], the Lifting Wavelet Transform (LWT) method was used to extract information from breast mammograms.When the size of the feature vectors was reduced using Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), the accuracy rates for the classification of the MIAS and DDSM datasets using Extreme Learning Machine (ELM) and moth flame optimization were 95.80% and 98.76%, respectively.In [27], the CNN Inception-v3 model was used, trained on a dataset consisting of 316 images.The model achieved an Area Under the Curve (AUC) value of 0.946, a specificity of 0.87, and a sensitivity of 0.88.In [28], a Convolutional Neural Network (CNN) with Transfer Learning (TL) was proposed to assess the effectiveness of eight improved pre-trained models.In [29], Mobilenet, ResNet50, and Alexnet were combined to produce a hybrid classification model with an accuracy of 95.6%.
Low contrast and typical changes in breast tissue density make it particularly difficult to accurately detect and classify breast masses on mammograms.Various Computer-Aided Diagnosis (CAD) systems are intended to assist radiologists in properly classifying breast abnormalities.In [30], a new breast cancer mass detection system was proposed, which showed very encouraging performance.A multiclass SVM model and K-means clustering were coupled with Deep Convolutional Neural Network (DCNN) algorithms to improve the accuracy in classifying breast cancers from mammography images.In [31], a breast cancer segmentation system based on EnsembleNet was proposed, achieving very effective classification results of 96.72% accuracy.In [32], the strengths of deep learning with a pre-trained ResNet50V2 model and ensemble-based machine learning approaches were combined to propose a reliable hybrid breast cancer diagnosis method.Deep learning allows the method to learn and retrieve obscure breast cancer patterns, while the interpretability and generalizability provided by machine learning algorithms are invaluable.Extensive studies were performed using a publicly accessible Invasive Ductal Carcinoma (IDC) breast histopathology imaging dataset with samples of varying sizes.The experimental results provided convincing arguments in favor of the stability and performance of the proposed hybrid model, but it was computationally extensive with a complicated training paradigm.Additional aspects like preprocessing, data augmentation, and transfer learning can alter the models' ability to attain improved accuracy because different deep learning models on the same database have varied accuracy ratings.In [33], a deep learning algorithm was proposed to create a fully automated model that would preprocess, segment, and categorize the severity of cancer spread from images collected from patients.

III. THE PROPOSED ARCHITECTURE
Automatic segmentation of human organs is crucial for the early detection and diagnosis of cancer.CNN has recently shown excellent performance in segmentation tests.The U-Net model, known for its use of an encoder-decoder design, is often used for medical image segmentation.Skip connections are used to link the high-level semantic feature maps of the decoder with the corresponding low-level detailed feature maps of the encoder.U-Net++ also included layered and dense skip connections to improve these connections, with the intent to bridge the semantic gap between the encoder and decoder.This approach aimed to mitigate the blending of semantically disparate characteristics resulting from the use of fundamental skip connections in the U-Net architecture.Despite producing respectable results, this method is still unable to examine enough data from all scales.To achieve improved segmentation results, this study was based on the U-Net 3+ neural network, which has the following new advantages:  Enhanced feature extraction: U-Net3+ incorporates a deep feature extraction pathway that captures both low-and high-level features of the input image.This allows the network to learn more meaningful representations and to make accurate predictions.
 Multiscale context aggregation: The U-Net3+ network utilizes skip connections to concatenate features from different resolution levels.This enables it to capture both local and global context information, leading to improved segmentation results, especially for objects of varying scales.
 Integration of dense skip connections: In addition to traditional skip connections, U-Net3+ introduces dense skip connections that connect each encoder layer to all decoder layers.This dense connectivity helps in the flow of information across different levels of the network, facilitating better feature reuse and gradient propagation.
 Efficient parameter utilization: U-Net3+ incorporates dense convolutions and squeeze-and-excitation blocks, which enable efficient parameter utilization.These mechanisms help reduce the number of parameters in the network while maintaining or even improving its segmentation performance.
 Robustness to limited training data: U-Net3+ has been shown to exhibit robust performance even with limited training data.The network's ability to capture both local and global context information and its dense skip connections aid in handling data scarcity scenarios.
 Versatility across different segmentation tasks: The U-Net3+ architecture is highly versatile and can be adapted to various image segmentation tasks, such as semantic segmentation, instance segmentation, and medical image segmentation.It has demonstrated competitive performance in different domains.
Although U-Net 3+ provides various advantages, it also provides different disadvantages, as it increases computational complexity by introducing additional pathways for information flow, including dense skip connections and feature concatenation.This increases the computational complexity of the network, requiring more memory and computational resources during training and inference steps.The following optimizations can be achieved on the U-Net3+ neural network: By integrating full-scale skip connections, which combine lowlevel details with high-level semantics from full-scale feature maps with fewer parameters, deep supervision to learn hierarchical representations from full-scale aggregated feature maps and optimizing a hybrid loss function to improve the organ border, it fully utilizes multiscale characteristics.Additionally, the proposed approach introduces a module led by classification to mitigate the issue of over-segmentation in non-organ pictures.The U-Net 3+ provides three main improvements, which are analyzed below.

A. Full-Scale Skip Connection
The connectivity of the encoder and decoder and the intraconnection of the decoder subnetworks are changed via fullscale skip connections.To fully capture fine-grained features and coarse-grained semantics, U-Net 3+ simultaneously uses the same-scale but smaller feature maps that are generated by the encoder and larger-scale characteristics from the decoder.U-Net 3+ redesigned skip connections and full-scale deep supervision to integrate multiscale features, using fewer parameters but producing a more precise position-aware and boundary-enhanced segmentation map compared to U-Net and U-Net++.The characteristics obtained from the third encoder block are directly received by the decoder.The low-level detailed information from the smaller-scale encoder layer is sent via a network of skip connections in an inter-encoderdecoder architecture between the and layers.Various optimizations have been applied to the original U-Net 3+ architecture to explore all the information provided in the input images.The max-pooling layers were replaced by dense dilated convolution layers with various dilation rates.In contrast to high-level semantic information that is transmitted from largerscale decoder layers and , a network of intra-decoder skip links uses bilinear interpolation.To effectively integrate both superficial and semantic information, a feature aggregation technique was performed on the concatenated feature map obtained from 5 different scales.The proposed approach incorporates a ReLU activation function, batch normalization, and 320 3×3 filters.Each decoder feature map in the U-Net 3+ network will be derived from N scales.Figure 1 shows the precise architectural layout of the third decoder layer.Third decoder construction architecture.

B. Full-Scale Deep Supervision
Full-scale deep supervision is additionally implemented in the U-Net 3+ to train structured representations from the fullscale aggregated map of characteristics.In the U-Net 3+ architecture, a multiscale structural similarity index (MS-SSIM) loss function has been proposed to offer the fuzzy border with extra weights and further improve the organ boundary.As a result, the U-Net 3+ architecture is designed to monitor the fuzzy border, since it has been shown that a greater regional distribution difference corresponds to an increased MS-SSIM value.The convolution transpose was employed as a second optimization applied to the U-Net 3+ network in place of the bilinear interpolation.Many benefits, including upsampling, learnable upsampling, sharing of parameters, endto-end learning, and reconstructive capabilities, are offered by convolution transpose operations.To achieve deep supervision, the final layer of each decoder stage is passed through a simple 3×3 convolution layer.This is followed by transpose convolution and the application of a sigmoid function.To improve the definition of organ boundaries, a loss function called the multiscale structural similarity index (MS-SSIM) [33] was used.This loss function assigns greater emphasis to indistinct or fuzzy boundaries by assigning them higher weights.This loss function can be calculated as: where β m and γ m present the importance of the two components for each scale, M presents the total number of scales, μ p , μ g , σ p , and σ g are the standard deviations of P, and σ pg presents their covariance.% and % are two constants set to % =0.01 2 and % = 0.03 2 .

C. Classification-Guided Module (CGM)
The appearance of false positives in non-organ pictures is a natural occurrence in the vast majority of medical image segmentation.Noisy information is probably the root of the problem.Over-segmentation occurs because the background layer is still the shallower layer.This architecture overcame the problem by introducing a second classification assignment that determines whether or not the input image comprises organs to produce a more accurate segmentation.As seen in Figure 1, the deepest-level ensures the creation of a two-dimensional tensor after some procedures, each of which depicts the likelihood of having or not having organs.These processes include dropout, convolution, transpose convolution, and sigmoid.The classification result can guide each segmentation side output in two steps, using the most detailed semantic information.To further optimize the proposed model, the adaptive max-pooling layer in the original U-Net 3+ architecture was substituted by an adaptive average-pooling that presents the following advantages: Flexibility in input size, consistent output size, preservation of spatial information, robustness to noise and outliers, and reduction of computational complexity.

A. Dataset and Materials
INbreast [19] is a publicly available database specifically designed for breast cancer research and related applications.It consists of a collection of digital mammography images and associated clinical data.It is an openly accessible dataset for research on breast cancer detection and diagnosis.This study focused on Full-Field Digital Mammography (FFDM), a commonly used imaging technology utilized in the detection of breast cancer.The collection comprises 115 mammographic images from 115 distinct patients.Each scan generally includes four mammographic images, including craniocaudal (CC) and mediolateral oblique (MLO) views of both breasts.The mammographic images in the dataset have a dimension of 2816×3328 pixels.The dataset offers ground-truth annotations for various abnormalities seen in mammograms, such as masses and microcalcifications.These annotations can be used for tasks like computer-aided detection or classification.The dataset contains additional patient data, including age and breast density.The impact of these characteristics on breast cancer diagnosis can be studied using these metadata.Figure 3 provides some examples of images in the INbreast dataset.The dataset was used to examine the effectiveness and reliability of the proposed method in detecting breast cancer.The INbreast dataset has 336 mammography images, 269 of which are abnormal and 69 normal.Of the abnormal images, 49 are malignant and 220 are benign.Tables I and II provide some statistics from the INbreast dataset in terms of normal and abnormal, malignant, and benign.During the experiments, various parameter values were adopted.Table III provides all the experimental settings used.The initial learning rate was fixed at 0.01 and the Adam optimizer was used as the learning The batch size was kept at 8 due to limitations in graphics memory.

B. Evaluation Metrics
Accuracy, sensitivity, specificity, and F1-score are a few of the evaluation criteria used to assess how effective the proposed method is.The dice score is commonly used in image In this study, a data augmentation method was used to mitigate this concern.These experiments employed rotation and flipping data augmentation techniques.

C. Results and Discussion
Table IV shows the confusion matrix of the proposed optimized version of the U-Net 3+ network on the INbreast dataset.The percentage of actual tumors that optimized U-Net 3+ accurately identified as tumors (TP) was 98%.Additionally, 95% of non-tumors were accurately recognized by the proposed optimized U-Net 3+ as non-tumors (TN).

Model
Dice score (%) Connected U-Net [34] 94.45 Connected-SegNets [35] 96.34 U-Net 3+ 97.29 Optimized U-Net 3+ 98.47 The proposed optimized U-Net 3+ network produced the best segmentation results compared to other recent methods.Figure 4 shows the segmentation results produced by the proposed model and the SegNet [35].The segmentation maps created by the optimized U-Net 3+ model were of higher quality and included fewer errors.Fig. 3.
Examples of breast tumor segmentation results.

D. Ablation Study
During the recent years, several DL models have been developed and applied to the segmentation of breast tumors.These DL models have segregated breast tumors on mammograms with remarkable success, but have high rates of false positive and false negative findings.To increase performance, various modules and layers were changed in the U-Net 3+ network.Several strategies were included in the suggested method.As a first optimization, transpose convolution was used instead of bilinear interpolation.The convolution transpose operation increased the spatial dimensions of the input feature preserving some characteristics of the original image, as it can be used to upsample images or create higher-resolution from lowerresolution images.
As a second optimization, max-pooling layers were replaced by dense dilated convolution.Due to its special characteristics, dense dilated convolution, also known as dilated or atrous convolution, is essential in segmentation tasks.Dense dilated convolution can capture bigger receptive fields without adding more parameters or lowering feature map resolution, which is a considerable advantage.This is crucial in segmentation tasks because it is crucial to comprehend the context and interactions between things.Dense dilated convolution, which incorporates dilations, enables the network to gather data from a larger area and simulate long-range dependencies, making it easier to include global context into the segmentation process.
The third optimization was replacing the adaptive maxpooling layer with an adaptive average-pooling layer.The ability of adaptive average pooling layers to handle input data of various sizes is a significant benefit.Adaptive average pooling layers dynamically adjust their pooling areas to meet the input size, in contrast to classic pooling layers that demand set kernel sizes.This adaptive behavior makes the network more adaptable and suitable to various image sizes or variablesized inputs in general, by enabling the network to accept inputs of arbitrary spatial dimensions.As shown in Table VII, when changing the bilinear interpolation to transpose convolution, a slight increase was achieved in terms of F1-score and number of parameters.When changing the max-pooling layers by dilated convolution layers, the F1-score increased compared to the original U-Net 3+ network, while the number of parameters increased by around 1.5 M. When changing the adaptive max-pooling layer to an adaptive average-pooling layer, the segmentation efficiency was increased while the number of parameters was almost the same, as the pooling operation is a parameterless technique.Combining the three techniques in the proposed optimized U-Net 3+, the best dice score of 98.47% was achieved, with a slight increase in the number of parameters compared to the original U-Net 3+ network.

V. CONCLUSION
Segmentation of breast cancer is of paramount importance, as it helps early detection, accurate diagnosis, treatment planning, quantitative analysis, personalized medicine, research, and prognosis.Using segmentation techniques, healthcare professionals can improve patient outcomes, optimize treatment strategies, and contribute to ongoing advances in breast cancer care.This study proposes a breast cancer segmentation system based on an optimized version of U-Net 3+.Various optimizations were applied to the U-Net 3+ neural network to improve spatial visibility and segmentation performance.Extensive experiments were conducted on the INbreast dataset.The obtained results proved the efficiency and the superior performance of the proposed method over known state-of-the-art methods.This superior performance was demonstrated by achieving the highest dice score, which was 98.47%.The main limitation of the proposed model is that it may struggle with anatomical variations between patients, such as differences in organ shapes and sizes.The model may require substantial adaptation or patient-specific fine-tuning to effectively handle such variability.Future research endeavors will mainly focus on the integration of novel deep-learning methods for cancer identification and classification, with the ultimate goal of automating the breast cancer diagnostic process.
segmentation and evaluation tasks in medical imaging.It is calculated as follows: &'() *(+,) = -2 * |1 ∩ 3|4 / -|1| 6 |3|4 (6) where A is the first set or binary image, B is the second set or binary image, |A| represents the cardinality (number of elements) of set |B| represents the cardinality of set B, and |A∩B| represents the cardinality of the intersection of sets A and B, i.e. the number of elements common to both sets.Insufficient data is an often-encountered challenge in deep learning models, which may lead to the danger of overfitting.

TABLE IV .
CONFUSION MATRIX RESULTS OF THE PROPOSED OPTIMIZED U-NET3+ ON INBREAST DATASET

TABLE V
Table VII provides the segmentation performance obtained for all optimizations.

TABLE VII .
IMPACT OF VARIOUS OPTIMIZATIONS ON THE SEGMENTATION PERFORMANCES