AIN 311 Weekly Blog 5: Transfer learning with VGG-16 as feature extractor

Galip And Halil
AIN311 Fall 2023 Projects
3 min readDec 28, 2023

Last week we published our project progress reports in that report we explained our method to solve this classification problem. This week we will continue to explain our implementation further more. Our initial goal is to find plants generally and see our method’s results.

So our object is generating semantic segmentation from this:

Example sample

To this:

Label sample

Semantic Segmentation

Semantic Segmentation is a powerful computer vision method that goes beyond recognizing objects in images, assigning a specific label to every pixel. This pixel-level understanding, predicted by neural networks like CNN, enables computers to distinguish fine details in images. It’s a key technology in applications like autonomous vehicles, where it helps identify and interpret the surroundings at a high level, and in medical imaging for precise organ segmentation. Semantic Segmentation transforms images into detailed maps, offering a detailed comprehension of scenes, and its applications continue to expand across industries requiring high-level image analysis for improved decision-making and automation.

Methodology

In this process to segment images we employed VGG-16 that we explained in previous blogs as feature extractor. After initializing VGG-16 and freezing its layers we crop the model before it’s first pooling layers to use VGG-16 as feature extractor.

for layer in VGG_model.layers:
layer.trainable = False

new_model = Model(inputs=VGG_model.input, outputs=VGG_model.get_layer('block1_conv2').output)
new_model.summary()

After froward passing an image in model we get 64 image with same size. These 64 represents pixels features. Therefore when we reshape images as tabular data pixels are the data points or rows and output intensity values of images are features or columns. When we look at the feature images they capture really we details of images.

After converting this images to tabular data we tried different machine learning models to fit the data. Random Forest with 50 estimator, Gaussian Naive Bayes and Logistic Regression with 10000 max iteration. As metric we used MIoU.

MIoU, or Mean Intersection over Union, is a metric used generally to evaluate the accuracy of image segmentation algorithms. It measures the overlap between predicted and true segmentation masks by dividing the intersection area by the union area. MIoU provides a single value that represents the quality of segmentation results, with higher values means better performance.

Results

We get %73 MIoU score with Random Forest, %71 with Gaussian Naive Bayes and %36 with Logistic Regression. While Random Forest has the best score. MIoU may not be the best metric for our problem because of the way labels made in dataset. Let me explain with an example.

Image in left is our predicted labels, image in middle labels by human and image in right is original image.

Image may not be high quality but it can be seen that for plant int left bottom most is better labeled in our predictions. Probably labeling some plants like this one by one is really hard and time consuming so they are labeled roughly.

Thanks for reading !

Mehmet Galip Tezcan and Kazım Halil Kesmük

--

--