Machine Learning Software Component Preparation

--

Prev

Trained Model Preparation

Machine learning model training can be implemented two different ways as offline training and online training. With reference to (ml-ops.org, 2022), offline training, also named as batch, static learning, uses already collected data as in my product I use prepared dataset of mole images. And online training uses streams of data for model training.

The results of ML model preparation or offline training is an artifact or CNN model, which I use in my ML model serving service.

Figure 1 illustrates high level flow of my mole classification for melanoma prediction. I use HAM10K dataset as input data for training, validation, and testing. The dataset is balanced using data augmentation. For grouping images into three sets, I shuffled the dataset randomly, after that, I copied images to the separate folders for training, validation, and testing. For training process, I use pretrained Convolutional Neural Network architectures, which expect images of size (224, 224, 3), that is why, the images are preprocessed by resizing from original size to (224, 224, 3) and the pixels in images are min-max normalized in range [0, 1] using rescaling by a factor 1/255. Apart of using pretrained CNN architectures using ImageNet dataset, I use adaptable learning rate during models training.

Figure 1

Using pretrained CNN models names as transfer learning or domain adaptation.

With reference to (cs231n.github.io, 2022), almost no one uses Convolutional Neural Network architectures with random initialization of wights and biases. CNN methods millions of parameters and the training process requires many resources and time. At the same time, most of CNN architectures are used in a way, when the models are pretrained on other datasets. I use pretrained CNN models based on ImageNet dataset, however the ImageNet does not like HAM10000 dataset. Moreover, I have two classes as melanoma and not melanoma, when ImageNet has many classes of data. My project contains small dataset of mole images, and they are not like images in ImageNet dataset, that is why regarding (cs231n.github.io, 2022), “it might not be best to train the classifier form the top of the network, which contains more dataset-specific features. Instead, it might work better to train the SVM classifier from activations somewhere earlier in the network.”.

Taking into account that I use pretrained Convolutional Neural Network models, the paper (cs231n.github.io, 2022) mentions “It’s common to use a smaller learning rate for ConvNet weights that are being fine-tuned, in comparison to the (randomly-initialized) weights”.

Convolutional Neural Network models includes two processes: feature extraction and classification. Classification process is last fully connected layers, which depends on the number of classes. And feature extraction consists of a set of convolution and pooling layers. The processing of training is provided by backpropagation process.

As a fact I have two classes as melanoma and not melanoma, I have a binary outcome of my trained model. And, as mentioned in (Liao, J.G. et al., 2007), “Logistic regression is a standard method for building prediction models for a binary outcome”.

ML Model Serving

The saved CNN model is used for prediction in production environment. Figure 2 illustrates how the saved model is used in ML model serving using REST API in my product.

Figure 2 CNN model artifact between offline learning and Production Environment

There are two main ways for machine learning functionality deployment in production. One of them is ML feature as embedded feature in the application. In this case, trained model is, as a resource, a part of the application. Other way is ML model serving, API access to the ML functionality. My product uses API access to predication functionality,
which allows to support functionality in phone applications, in web browsers and in desktop applications.

References

ml-ops.org, 2022, Three Levels of ML Software. Available at: https://ml-ops.org/content/three-levels-of-ml-software

[Accessed 23 July 2022]

cs231n.github.io, 2022, Transfer Learning. Available at: https://cs231n.github.io/transfer-learning/

[Accessed 28 May 2022]

Liao, J.G. and Chin, K.V., 2007. Logistic regression for disease classification using microarray data: model selection in a large p and small n case. Bioinformatics, 23(15), pp.1945–1951.

[Accessed 30 May 2022]

--

--