WomanLife: Deep Learning for the detection and classification of breast cancer
Introduction
Breast cancer is the most common type of cancer in women and is also one of the main causes of death according to the WHO (WHO, 2020).
Early detection is the single most important factor in lowering cancer treatment costs and mortality. To make it possible it is necessary to have medical ultrasound images and specialists who can explain them. However, the lack of these creates a gap in access to early treatment in countries with little or not enough access to specialized diagnostic services and whose population receives low and middle income.
Today, thanks to deep learning algorithms of artificial intelligence, we have the possibility to automate the classification of these images, so this tool can help medical personnel in the classification and early detection of breast cancer. In this way, women suffering from this disease could be diagnosed automatically, in time to start treatment.
Description of the problem
Our project consists of the detection and classification of breast cancer in women between 25 and 75 years old. This is possible from the development of an AI model trained with images obtained using ultrasound scanners that result in the segmentation of the type of cancer that could be suffered.
Objective
Allow women suffering from breast cancer to be automatically diagnosed using a deep learning model so that they can start treatment early and safely, reducing costs and the mortality rate. To meet this objective, we have proposed a tool that uses artificial intelligence to provide greater agility to the process through self-diagnosis with ultrasound images.
Model selection
The breast cancer detection and classification project works with ultrasound images of three types, labeled as benign, malignant and neutral, so the model selected for its execution is convolutional networks with TensorFlow Keras.
Datasets
The dataset was collected from Baheya Hospital for Early Detection and Treatment of Women’s Cancer, Cairo, Egypt. It contains 780 breast ultrasound images, in women between 25 and 75 years old (133 normal, 437 benign and 210 malignant) with an average image size of 500 x 500 pixels, some of which are seen below,
The images from the original dataset contain mask images that do not provide meaningful information to the model we developed, for this reason Shell statements were used to remove them from the dataset we are using.
Implemented techniques
We must emphasize that until now there is a shortage of public data sets of breast cancer ultrasound images and it prevents the good performance of the algorithms. Because of this, the authors who made public the dataset we used, recommend augmenting data using GANs.
Our project developed GAN networks for each class in order to obtain more accurate results and 150 epochs were used.
However, it failed to create usable images, for this reason we declined the use of this technique. The challenge is to develop the GAN with a greater number of epochs and with a better neural network configuration to obtain more realistic images.
Network definition
Within the possible design patterns in Keras, subclassing has been implemented to use the low-level APIs of Keras. You can consult more information about this in the following article:
https://towardsdatascience.com/3-keras-design-patterns-every-ml-engineer-should-know-cae87618c7e3
The structure of the network consists of:
- Preprocessing layer: Resizing, Rescaling and Normalization
- Conv2D: 32 filters, 4 strides, ‘same’ padding and ReLU activation
- MaxPooling2D: pool_size of (3,3), ‘same’ padding and 2 strides
- Flatten
- Dense: 512 neurons and ReLU activation
- Dropout (0.4)
- Dense: 3 neurons and SoftMax activation
We are based on AlexNet architecture, on which we made some adjustments like number of neurons, fully connected layers and dropout values.
We use Adam optimizer with learning rate of 0.0001, the Sparse Categorical Crossentropy loss function and Sparse Categorical Accuracy function.
Training
TensorBoard was used to observe the real-time behavior of the accuracy and loss values, which provides useful graphs to analyze results and many controls for their manipulation.
Earlystopping
We use EarlyStopping as a form of regularization to avoid overfitting when training the model. For example, if the loss value stops decreasing, the training will stop even though all iterations have not been completed.
Conclusions and future works
WomanLife is intended to be an easy-to-access, low-cost medical diagnostic tool.
This AI is not only beneficial for women who use it but also has the potential to become a medical assistant. We want to clarify that WomanLife does not intend to replace medical specialists but to provide a tool that facilitates their work.
From now on we intend to optimize the model using a GAN network to obtain greater precision and use techniques that find the correct parameters for training the model (Hyperparameter tuning).
Our project also developed an application that, given an image scanned with the camera or selected from the gallery, goes through the developed network and returns a series of probabilities related to the type of cancer suffered.
The model was developed in pure TensorFlow, converted, saved and exported to TensorFlow Lite.
Sources
You can access to notebook and mobile application through my GitHub repositories bellow:
https://github.com/edcalderin/BreastCancerDetection_CNN
https://github.com/edcalderin/BreastCancerDetection_app
Here, you will can find more projects related to Data Science and Machine Learning. In summary, it contains all my work so far. Any reply or comment is always welcome.
About the authors
Erick Calderin Morales
Systems engineer with experience in software development, master’s student in systems engineering and master’s degree in data science with an affinity for artificial intelligence.
Linkedin: https://www.linkedin.com/in/erick-calderin-5bb6963b/
Sharon Maygua Mendiola
Mechatronics engineering student with a degree in physics.
Linkedin: https://www.linkedin.com/in/sharon-sarai-maygua-mendiola-22288019a/
References
- Al-Dhabyani, W., Gomaa, M., Khaled, H., & Aly, F. (2019). Deep learning approaches for data augmentation and classification of breast masses using ultrasound images. Int. J. Adv. Comput. Sci. Appl, 10(5), 1–11.
- Al-Dhabyani, W., Gomaa, M., Khaled, H., & Fahmy, A. (2020). Dataset of breast ultrasound images. Data in brief, 28, 104863.
- TensorFlow. 2021. Deep Convolutional Generative Adversarial Network | TensorFlow Core. [online] Available at: <https://www.tensorflow.org/tutorials/generative/dcgan>.
- Gowda. R., 2019. DCGAN Implementation in Keras explained. [online] Medium. Available at: <https://medium.com/@ramyahrgowda/dcgan-implementation-in-keras-explained-e1918fc930ea>.
- Poulopoulos. D, 2021. 3 Keras Design Patterns Every ML Engineer Should Know. [online] Medium. Available at: <https://towardsdatascience.com/3-keras-design-patterns-every-ml-engineer-should-know-cae87618c7e3>.