Towards Advanced Accommodation: Deep Learning for Photos Classification — Part 1

Arie Pratama Sutiono
Airy ♥ Science
Published in
6 min readSep 26, 2019

The Problem

We wanted to automatically classify our images into several categories: bedroom, bathroom, receptionist, etc. Initially, this could be tackled by humans, to classify photos, however as we grow into a larger company, we need annotators to do this job because the stream of photos are so large to be handled. In the data team, we are aware of this problem and we decided to explore the use of deep learning to classify images into these categories.

Deep learning has emerged for tackling image problems because deep learning has proven to reduce feature engineering effort on image data[7]. The chain of neural network has “learn” feature engineering itself without even human to directly tells the model what to extract. Hence it is super popular for unstructured data nowadays.

Transfer Learning

There has been a lot of neural network architectures that produce good image classification even without too many feature engineering, such as Inception, VGG, DenseNet (there has also benchmarks of these models in PyTorch documentation [6]). This makes our task easier because the feature weights could be transferred to solve a similar problem in another domain [1], such as ours.

We have done the current experiment using transfer learning for image classification task. The results have reportedly resulted in > 80% accuracy in both training and validation. However, trying transfer learning alone without anything done in preprocessing was not enough to get this kind of result. Therefore we will tell readers on some things that could be done to tackle image classification.

Looking at Current Data Distribution

We decided to sample our data and manually annotated these images. We currently have 3,000 data points of our hotel images as our sample. We decided to crowdsource the effort to all members of data team. Afterward, we conducted a workshop using Jupyter notebook and PyTorch. We wrote tutorials using Keras as well.

Now let’s look at our current data distribution.

We do know that our model could be bias onto majority classes: bedroom and amenities. Then how should we overcome it?

Oversampling

One of the simplest things that could be done before feeding our data into our model to help our model correctly predicts minority classes is oversampling [2][4]. This approach is pretty common in the area of machine learning. Here we will explain some of the techniques that could be done.

Naive Random Oversampling

The basic idea of oversampling is to generate enough data from the minority. The simplest idea that comes to mind is, “Why don’t we randomly duplicate the minority data?”. Now we can also use imblearn library[3] to randomly sample our data. By default, this will duplicate our minority classes as much as our majority class.

Sample Code to use Random Oversampling with imblearn

SMOTE

SMOTE stands for Synthetic Minority Over-sampling TEchnique[5]. The basic idea is to generate synthetic data using the nearest neighbor method. This could be easily done using imblearn library.

Sample Code to use SMOTE with imblearn

Using SMOTE in our dataset needs a few additional efforts. In the sample code above, we have to resize the images to prevent memory explosion. Only then we could use SMOTE function safely.

By default, SMOTE produces the same number of synthetic classes as majority class.

What’s interesting is the result of doing SMOTE on image dataset.

SMOTE results for Facade class
SMOTE Results for Facade Class
SMOTE Results for Reception class
SMOTE Results for Reception Class

As you might have guessed SMOTE has generated synthetic data by combining 2 or more images somewhere among a cluster of data.

Illustration of How SMOTE Works

Data Augmentation

There are many ways to make a neural network learn better and yet not overfit. Data augmentation is one of the go-to techniques to do that [8]. The idea of data augmentation is to mimic our brain behavior. Say, when you look at an object — a dog, for example. Your brain could recognize that dog even when it walks, sits, turns left, turns right, is near or far, in the night or at noon, etc. The same ideas apply to our neural network! You could flip, rotate, crop, brighten, darken, etc. Here we will briefly mention 3 data augmentation that we have used in our classifier: Random Cropping, Random Horizontal Flipping, Random Rotation.

Sample Code to Reproduce Data Augmentation Described In This Post

Results

Here we will show results of using oversampling and data augmentation. I am using 10 epochs, cross-entropy loss, Adam optimizer, and DenseNet121 model.

Using Only Oversampling (SMOTE)

Looking at the benchmark on imblearn web [9], we believe that SMOTE could result in a more robust model. Note: we will describe “Normal” as a scenario without oversampling.

Oversampling (SMOTE) + Data Augmentation

Next, will data augmentation improves our performances?

Current Conclusion

We have described a transfer learning technique in conjunction with oversampling and data augmentation. We have experimented with oversampling only and oversampling + data augmentation. In our settings, oversampling, especially using SMOTE, results in less than average validation accuracy. Adding data augmentation seems to make model performance significantly dropped. However, what should be analyzed more is to see the f1-score per class. Looking solely at the accuracy could result in a biased model that prefers to guess the majority class instead of correctly classify each class.

More To Be Explored (and Explained)

In this post, we have briefly described the effect of oversampling and data augmentation in conjunction with transfer learning with DenseNet121 model from 10 epochs. In the next post, we will explain more advanced techniques to be used to increase our model performance to tackle this problem.

There are few questions that readers might want to explore:

  1. Are there more robust oversampling techniques? Perhaps, we do not even need oversampling?
  2. Why would SMOTE makes our model performance worse?
  3. Is the number of epochs too small?
  4. We can observe that the performance of DenseNet121 has already begun plateauing on 10 epoch, and it seems to perform worse when data augmentation applied. How to solve this? Was it because this network is not flexible enough?

Check out the next post about optimization used to tune this model!

Closing Remarks

At Airy, especially on Engineering, we embrace 3 big values, BIC: Bold, Innovative, and Customer-Centric. We are striving to combine the current state of the art technology in accommodation space in the spirit of BIC. If you have the same passion with us, come and join us!

Acknowledgment

I want to thank Samsu Sempena and Ali Akbar for reviewing this blog post.

References

[1] Yosinski, J., Cline, J., Bengio, Y., Lipson, H. (2014). How transferable are features in deep neural networks?. NIPS

[2] https://en.wikipedia.org/wiki/Oversampling.

[3] https://imbalanced-learn.readthedocs.io/en/stable/

[4]https://en.wikipedia.org/wiki/Oversampling_and_undersampling_in_data_analysis

[5] Chawla, N., Bowyer, K., Hall, L., Kegelmeyer W., (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research

[6] https://pytorch.org/docs/stable/torchvision/models.html

[7] Chollet, F. Deep Learning with Python. (2018).

[8] https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html.

[9] https://imbalanced-learn.readthedocs.io/en/stable/auto_examples/applications/plot_over_sampling_benchmark_lfw.html#sphx-glr-auto-examples-applications-plot-over-sampling-benchmark-lfw-py.

--

--