CriticNet: Scaling the business with Deep Learning — Part 2

Published in

Plum Guide

8 min readAug 6, 2021

Introduction

In the previous article, we explained what CriticNet is, how it is used to grade homes in terms of quality, and how CriticNet helps significantly to scale up home acquisition.

CriticNet requires input of four images, one image per room type (bathroom, bedroom, living room, kitchen) to score the home. Unfortunately, the dataset we had to do this was far from perfect, this required various problems to be addressed to reach the point where we end up with the best image per room type.

In this blog post, I will talk about the steps that need to be taken to clean out, preprocess the dataset and bring it to the desired format to feed it to CriticNet.

Dataset Challenges

The dataset we are assessing is a list of homes on other external platforms. This dataset is provided to us by a third-party provider and you can see what it looks like in Figure 1. It contains a unique identifier of a home and URL links of the images of the home.

As you can see, the images — links are not classified in any room type and each home contains ~40 images on average. Figure 2 depicts the images from The Whaler’s Cottage and you can see that there are few challenges we need to overcome.

First of all, there are images from outdoor spaces which are not useful for grading the home. So these images are just noise we need to get rid of. In addition, indoor images are not always useful. These images usually don’t depict the space of the room so it is difficult for the grader to figure out whether the room meets the Plum standards or not.

To clean and preprocess the dataset, we need to divide and conquer the problem. More specifically, we need to tackle the following challenges:

Classify images into room types: This is a classification problem in which we have to classify an image into a room type, living room, bedroom, bathroom, kitchen, or outdoor space.
Then we need, for each room type, to pick the image which best represents the room and this is a hard problem.

Room Classification

The first task is to build an image classifier that predicts the room type of a given image. This problem doesn’t seem hard, as we can use state-of-the-art deep learning models. Let’s take a look at the dataset first.

Dataset

Unfortunately, at Plum Guide, we don’t classify images into room types so there is not any in-house dataset that we can use. So, we built a crawler that inputs the keywords living room, bathroom, kitchen, bedroom, and outdoors in google search and fetches the images from the web. In that way, we created a decent dataset (see Figure 3). The dataset seems balanced among the 5 classes (see Figure 5) so we can start developing and training our model.

Figure 3: “Living room” results in Google search

Figure 4: Example of how the dataset looks like (left) and the distribution of the classes (right)

Training and Evaluation

The dataset size is not big enough for training a deep learning model from scratch, so we applied transfer learning and fine-tuned the weights of the pre-trained neural network. The pre-trained model that we picked to retrain is the EfficientNetB7 which achieves the highest accuracy with a relatively small number of parameters. The steps that we followed to train the model are:

We kept the top layer weights frozen (convolutional layers) and retrained with a high learning rate the softmax layer of 5 classes. In that way, we help the weights in the softmax layer to adjust to the pre-trained weights of the top layers of the network.
Then we trained the whole network with a small learning rate. The small learning rate helps to adjust the weights of the network on our dataset. Small learning rates keep the information of the pre-trained model. Remember it’s the final tweak of the weights so we don’t want big changes.

The model achieves 96.3% precision and 95% recall. Figure 5 depicts the precision-recall for different threshold values. The optimal threshold seems to be 0.5.

Let’s dig deeper into the evaluation results. In figure 6, we plot the confusion matrix, which gives us a clear picture of the performance of the classifier in each class. A few key observations are:

The model always predicts correctly whether the photo is an outdoor space. However, because the outdoor spaces can be anything in the real world we should expect some misclassifications of the model in new unseen data when the model runs in production.
The living room class has the lowest performance with a precision score of 91% and recall 88%. Figure 7 shows the false positives and false negatives for the living room class. From the image below we notice that some images which don’t capture enough room space are very difficult to classify.

Figure 7: Living room false negatives (left) and false positives (right)

Robojack

The second challenge we need to tackle is to figure out how to pick the best image per room type. First of all, we need to define what a good image means.

Imagine you are a home critic and you need to decide if a home is of high quality only from the photos. You need to have photos that capture as much of the room as possible. This is what we mean when we talk about photo composition.

Dataset

Our brand-quality team had compiled a dataset that contained photos of homes and each photo is categorized into two categories. The photoshopping meets the Plum standards or not (see figure 8 and figure 9).

Training

This is a binary classification task because the target variable is binary (meet Plum standards or not). In terms of the modeling, we will retrain and fine-tune again a pre-trained deep learning model. There are two options here:

First, classify the images into room types and then retrain four deep learning models, each one for each room type. In that way, the training data would be more homogeneous and the task for the classifier is easier.
Simply train one model for all the images. This task is harder however it makes our life easier in the development of the whole pipeline and cheaper.

We decided to go with the second option. However, in the future, we plan to try the first approach too.

The model we used is again the EfficientNetB7 and we followed the same approach in the training phase as before.

Evaluation

The model achieved 88.1% precision and 88.3% recall. From the confusion matrix (see Figure 9), we can see that the false negatives are 13%, which means that the classifier predicts wrongly that the quality of the image doesn’t meet the Plum standards (high photo shooting quality), and 10% false positives which means that the classifier predicts wrongly that an image is high-quality.

Figure 10: Robojack false positives and false negatives results

CriticNet Architecture

So far, we have presented how we can clean and preprocess our dataset by using the room classifier to classify the photos of a home into room types and classify images into high-quality photography and low-quality photography. Now we will talk about how all these models are being assembled to grade homes automatically.

The first step is to classify all the raw home images into room types. The reason we do this is that we want to get rid of outdoor space images and also we want to pick the best images per room type and feed them to CriticNet.
In the next step, we need to filter out images whose photo composition is poor. More specifically, we want to keep the images that depict as much space of the room as possible and ideally the best ones. For that reason, we feed the images to Robojack. Robojack classifies the images into high-quality images and low-quality images. It is a binary classification so the output is a score that expresses the probability that a given image is high-quality. From a set of images of the same room type, we pick the image with the highest probability to be high-quality. In that way, we end up with four images per home, and we are ready to make predictions with the CriticNet (see Figure 11).

Figure 11: Automatic Grading Architecture

Conclusion

In this blog post, we described how we managed to use deep learning models to clean and preprocess the original dataset for consumption by the CriticNet. The room classifier and Robojack are not perfect and the potential misclassification errors propagate to CriticNet. However, CriticNet has shown exceptional performance on filtering out homes with very low quality, which helped the scaling of the home acquisition of the business significantly.

As you may have noticed, we use only one image per room type. However, homes can have multiple different rooms of the same type. Currently, CriticNet doesn’t deal with this case. By picking the best images per room type is one of the reasons that the precision of CriticNet is relatively low (~66%).

If you like our work and are interested in solving challenging problems, we are hiring engineers to work with the data science team on rebuilding the search experience. You can apply here!

CriticNet: Scaling the business with Deep Learning — Part 2

Introduction

Dataset Challenges

Room Classification

Dataset

Training and Evaluation

Robojack

Dataset

Training

Evaluation

CriticNet Architecture

Conclusion

Written by Manos Loukadakis