Part 1 Part 3

Melanoma Images Datasets

I prepared dataset of input data using the HAM10K dataset (Tschandl, P. et al., 2018). With reference to melanoma, I choose two groups of images, one group includes images of melanoma, and the other group contains images of moles without melanoma. The dataset of these 2 groups is imbalanced dataset. Datasets, related to medicine, have issues that the number of images is small, or the data classes are not balanced. One of the ways to balance the dataset is data augmentation. Kaggle platform (Scarlat, A., 2018) contains the balanced dataset based on HAM10K dataset. For my purposes of learning the model I use the balanced dataset from Kaggle platform (Scarlat, A., 2018). The article (Scarlat, A., 2019) describes the process of balancing the HAM10K dataset at the same time the HAM10k dataset is normalized “in terms of luminosity, colors, resolution” (Scarlat, A., 2019). The balancing of the dataset is done by using data augmentation. The paper (Mikołajczyk, A. et al., 2018) describes why data augmentation is used for deep learning and how data augmentation allows to improve robust of the image classification. HAM10K dataset is used for me as the most references dataset in papers for melanoma detection. For example, the paper (Rasul, M.F. et al., 2020) of using CNN without deep pre-processing uses HAM10K dataset for classification. The paper (Naeem, A. et al., 2020) published the review of public datasets with melanoma images. Also, the paper (Gupta, A. et al., 2020) describes that most of the datasets contains small number of images of melanoma before HAM10K dataset. Moreover, ISIC Challenge (ISIC challenge, 2022) online resource contains datasets with lesion images. For instance, ISIC 2017 dataset (ISIC 2017 challenge, 2022) contains 2000 images. Other database is PH2 dataset (ADDI Project, 2012) has 200 images of lesions. Most of the datasets with lesions, which are mentioned in melanoma papers have small number of images.

Table 1 shows some of online available datasets with images of moles and melanoma.

References

Tschandl, P., Rosendahl, C. and Kittler, H., 2018. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific data, 5(1), pp.1–9.

[Accessed 01 May 2022]

Mikołajczyk, A. and Grochowski, M., 2018, May. Data augmentation for improving deep learning in image classification problem. In 2018 international interdisciplinary PhD workshop (IIPhDW) (pp. 117–122). IEE

[Accessed 25 May 2022]

Rasul, M.F., Dey, N.K. and Hashem, M.M.A., 2020, June. A comparative study of neural network architectures for lesion segmentation and melanoma detection. In 2020 IEEE Region 10 Symposium (TENSYMP) (pp. 1572–1575). IEEE.

[Accessed 25 May 2022]

Naeem, A., Farooq, M.S., Khelifi, A. and Abid, A., 2020. Malignant melanoma classification using deep learning: datasets, performance measurements, challenges and opportunities. IEEE Access, 8, pp.110575–110597.

[Accessed 25 May 2022]

Gupta, A., Thakur, S. and Rana, A., 2020, June. Study of melanoma detection and classification techniques. In 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO) (pp. 1345–1350). IEEE.

[Accessed 25 May 2022]

ISIC challenge, 2022, ISIC Challenge Datasets Available at: https://challenge.isic-archive.com/data/

[Accessed 25 May 2022]

ISIC 2018 challenge 2018, ISIC 2018 Challenge Available at: https://challenge.isic-archive.com/data/#2018

[Accessed 27 May 2022]

ISIC 2017 challenge, 2017, ISIC 2017 Challenge Available at: https://challenge.isic-archive.com/data/#2017

[Accessed 27 May 2022]

ADDI Project, 2012, PH2 Database Available at: https://www.fc.up.pt/addi/ph2%20database.html

[Accessed 27 May 2022]

arxiv.org BCN20000, 2019, BCN20000: Dermoscopic Lesions in the Wild Available at: https://arxiv.org/abs/1908.02288

[Accessed 29 May 2022]

--

--