Dermatological disease classification using consumer-grade images

Krishnam Gupta
mfine-technology
Published in
9 min readDec 15, 2020

Research on skin images has majorly focused on areas related to skin cancer detection. Although it’s a very important field of research, there are several problems in dermatology to solve with the help of Artificial Intelligence(AI).

Beyond detecting skin cancer or classifying a tumour as benign or malignant, we can infer so much more from skin images. And now, with rise in telemedicine, there’s an added need to augment a doctor’s diagnosis capabilities with the help of AI. Especially in dermatology, which has 400+ conditions, the need for a Computer-Aided Image-based Differential Diagnosis cannot be stressed enough.

A rise in telemedicine has given us an advantage in developing robust machine learning models. We now have a constant feedback mechanism to the model in the form of input from the doctor. Due to a high number of conditions in dermatology which vary in rarity, it is possible to miss out on some rare diseases and in turn increase the chance of misdiagnosis. A ranked list of top-10 conditions might help the doctor in taking a more well-informed decision.

Based on user captured images of the affected region, coupled with the knowledge of symptoms observed, the model predicts a differential diagnosis for the case. This is presented to the doctor as an addition to the original data. The final diagnosis entered by the doctor provides a feedback mechanism for the AI model to learn from and constantly improve its effectiveness. It is practically learning from all the dermatologists present on our platform.

The modality of choice for imaging skin conditions in a clinical setting is a dermatoscope. Thus, a vast majority of research, in computer vision for skin condition localisation and classification, has focused on computer-aided diagnosis based on dermoscopic images. The images acquired using a dermatoscope are of high resolution, less variant to lighting conditions and have the skin condition of interest enhanced and in the centre of the field of view. We show an example below, of how a lesion looks under a dermatoscope vs when it is captured by a phone camera.

Top-left image[10] is of a consumer-grade image of a skin lesion, bottom-left is the same image zoomed in on the lesion. The image on the right is of the same lesion observed through a Dermatoscope.

However, the problems associated with the quality of care and limited reach of current healthcare systems have led to patients opting for online consultation. Equipped with the best healthcare facility available across borders and boundaries, online consultation is becoming a new norm. This invariably has resulted in the use of mobile technology such as video conferencing and data sharing via high-speed internet networks for efficient consultation. Consultations in dermatology have followed the online trend more than any other field of medicine barring tele radiology. Consultations typically involve the patients describing their condition and uploading images of the skin condition of interest acquired using mobile phones or hand held digital cameras. Such images, acquired by users on a non-clinical mobile imaging device, are popularly referred to as consumer grade images.

Automatically localising, classifying and quantising the skin conditions on consumer grade images is challenging due to:

  1. The boundaries of skin conditions being diffused, irregular and fuzzy.
  2. The contrast is relatively low between the lesion and the surrounding skin.
  3. The occurrence of fragmentation or variegated colouring inside the lesion.
  4. The skin condition of interest usually occupying a relatively small area in the image

Research in this domain is only recently picking up momentum and some research groups have released large datasets to boost the research on skin disease recognition on consumer-grade images. SD-198 is one such dataset made publicly available by Sun et. al [1] that comprises 6000+ consumer-grade images across 198 different skin conditions.

7-point checklist for dermatological criteria

Recent efforts have been in the direction of incorporating comprehensive medical criteria such as the ABCD rule [2] and the 7-point checklist [3] to improve the classification of skin diseases. Yang et al [4] incorporate these dermatologist criteria and increase classification accuracy on consumer-grade images. Attempts are underway to solve the class imbalance problem in the skin disease datasets for classification problems. A novel Self Paced Balanced Learning method is proposed [5] to combat the problem of class imbalance in skin disease datasets. The authors introduce a comprehensive metric termed the complexity of the image category that is a combination of both sample number and recognition difficulty. This approach improves classification accuracy further to 67.8% on SD-198 and is the current state of the art method.

What we are doing

Overview of our method

We observed that even though an average smartphone is capable of capturing high-resolution images, feeding these images to a deep learning model, we resize the image to a lower resolution due to the constraints on model size. Feeding a full resolution image to the network is not practical. On the other hand resizing it to a smaller size, we are missing a lot of information. This is not viable for our current problem, where region of interest for skin lesion is already very small in size for many diseases.

There are some patch-based algorithms[6] which tackle such problems, dividing the full resolution images into grid-based patches, and then feeding these patches to the network. But we will be losing out on the full image information, such as global cues like the location of the lesion, that are also very crucial for some diseases. We need to strike a balance in learning from global as well as local cues, without increasing the model size considerably.

So, we propose a novel dual-stream deep network that is trained in a multi-phase manner. It employs both global and local cues about the skin conditions of interest to perform skin condition recognition with implicit segmentation. To take advantage of both global and local features, a two-level image pyramid is employed in this architecture, where a down-sampled input image is first segmented by a weakly supervised segmentation algorithm. The segmentation masks are then used to crop-out the regions of interest in the original image which is then used to classify the skin condition of interest. While the first stream, of the proposed dual-stream network, learns global cues from the input image, the second stream learns the local cues from within the segmentation masks.

We also propose a novel optimisation strategy to help the network efficiently learn both independent, as well as complementary information from the two streams. This Selective Optimisation technique performs better than optimising the dual-stream network in a naive manner. Such a strategy coupled with the effectiveness of dual stream network results in a 5% increase of accuracy over the current state-of-the-art methods on SD-198 dataset.

We now give a brief overview of our methodology :

A. Constructing RoI-cropped patches using weakly supervised segmentation

Generating RoI-cropped patches of skin disease area using weakly supervised segmentation.

We create a Region of Interest patch in the input image using ‘weakly supervised segmentation’. ‘Weakly supervised segmentation’ refers to the process of generating segmentation maps of an image without actual annotated segmentation maps. The ground truth labels for the classification task are used to infer the important regions of image, which we refer to as RoI. Upon prediction from the classifier network, we mark the regions in the image which contribute the most for the corresponding classification label. These can be used as a proxy to actual annotated regions.

We explore methods like GradCam[7], FullGrad[8] and ACoL[9] to generate the segmentation heatmaps and select ACoL as a backbone out of these, based on performance. The heatmaps are further processed to convert it into a bounding box as shown in the image above. We then crop out this corresponding bounding box from a high resolution input image, getting a high-resolution segmented RoI from the input image.

B. Dual Stream Network Architecture

Dual Stream Network Architecture

The dual stream network D consists of an image stream and a patch stream. These streams are sub-networks which take in both images and corresponding RoI-cropped patches to process them. In addition to this, there is a third combiner sub-network which combines output features of the other two sub-networks and learns from the correlation of those features. The complete Dual Stream network D can be defined as a composition of these three sub-networks : Si, Sp and Sc with model weights wi, wp and wc.

D(w, x) = Sc(wc, (Si(wi, X_image), Sp(wp, X_patch)))

There are three Fully Connected(FC) layers in D. One for each of the sub-network. A softmax + argmax layer is added to each of these FC layers to get the predicted labels. The ground truth labels are used to calculate loss corresponding to each of the output. There are a total of three losses : Li corresponding to image stream, Lp corresponding to patch stream and Lc corresponding to combiner sub-network. Since the image stream and patch stream are independent, a common Stream Loss Ls as Li + Lp is defined. A total loss Lt is also defined which combines Stream loss and Combiner loss as follows :

Lt=Lc+βLs

Here, β is a hyper-parameter called Loss Ratio. It balances the learning between independent features from streams and the combined features learning. We chose the value of β empirically.

C. Optimization Strategy

The dual stream network is optimised in an alternate manner over multiple phases. In the first phase of learning, the network is optimised using Stream loss Ls which helps it to learn independent features from stream. Only model weights of the two streams are changed during this phase. When training loss stops to decrease, the loss is switched to Total loss Lt. In this second phase of learning, the network now learns combined features from both the streams with the help of combiner sub-network, as Lt contains combiner sub-network’s loss. All the model weights, including that of the combiner sub-network are changed during this phase. When training loss stops decreasing in the second phase, we again switch back to optimising Ls.

Thus the architecture is designed to keep on alternating between Lt & Ls until the network stops to learn, and training loss does not decrease further. Alternating between Stream loss and Total loss ensures that a balance is struck between learning both independent stream features, as well as learning from correlation of these features. Better learnt independent features will lead to learning of better combined features as well. We also hypothesise that learning better combined features induce better independent features as well.

What’s Next ?

With a powerful ML model, we will soon quantify a skin lesion, and track it over time. With this ability, we will be able to assess the effectiveness of treatment plans in a more quantified manner. It will aid doctors’ diagnosis by additionally recommending more personalised treatment plans. The doctor eventually takes the final call, and the model keeps learning from it, improving itself over time.

References :

[1] Sun,Xiaoxiao,etal.”A benchmark for automatic visual classification of clinical skin disease images.” European Conference on Computer Vision. Springer, Cham, 2016.

[2] Stolz, W. ”ABCD rule of dermatoscopy: a new practical method for early recognition of malignant melanoma.” Eur. J. Dermatol. 4 (1994): 521–527.

[3] Argenziano, Giuseppe, et al. ”Epiluminescence microscopy for the diagnosis of doubtful melanocytic skin lesions: comparison of the ABCD rule of dermatoscopy and a new 7-point checklist based on pattern analysis.” Archives of dermatology 134.12 (1998): 1563–1570

[4] Yang, Jufeng, et al. ”Clinical skin lesion diagnosis using representations inspired by dermatologist criteria.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

[5] Yang,Jufeng,et al.”Self-paced balance learning for clinical skin disease recognition.” IEEE transactions on neural networks and learning systems (2019).

[6] Gessert, Nils, et al. “Skin lesion classification using cnns with patch-based attention and diagnosis-guided loss weighting.” IEEE Transactions on Biomedical Engineering 67.2 (2019): 495–503.

[7] Selvaraju, Ramprasaath R., et al. “Grad-cam: Visual explanations from deep networks via gradient-based localization.” Proceedings of the IEEE international conference on computer vision. 2017.

[8] Srinivas, Suraj, and François Fleuret. “Full-gradient representation for neural network visualization.” Advances in Neural Information Processing Systems. 2019.

[9] Zhang, Xiaolin, et al. “Adversarial complementary learning for weakly supervised object localization.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018

[10]https://www.guidelinesinpractice.co.uk/skin-and-wound-care/top-tips-getting-started-with-dermoscopy/455038.article

--

--