How we used “Mandarins🍊” to Enhance AWS Sagemaker Object Detection

Yuxuan Lin
carsales-dev
Published in
8 min readJul 2, 2019

Object detection is an AI model which is used to locate objects in an image. From detecting human faces, cars and also for medical examination such as detecting tumours, it has been widely adopted for different domain areas.

Building Our Own Object Detection Model

carsales retail portal facilitates buying and selling cars. There are approximately 250,000 cars in our platform and there are 5,000 new ads submitted a day. To keep the quality of the ads high, the customer support team (CST) manually view each ad. Last year carsales deployed an AI technology called Tessa which helps to automate the approval process. This helped to reduce the manual workload for CST as it only takes 7 seconds to process the ads. This is a significant decrease in time from the previous process which took approximately 3.5 hours.

Tessa has many rules in place to approve an ad and one of them is to make sure that there is at least one photo with a visible rego number. Using existing rego recognition services won’t help much with challenging conditions such as when the rego plate angle is too steep or when the lighting is poor resulting in lots of miss-detection. In order to overcome this issue, we must build the rego detection AI. 💪

We used AWS SageMaker Object Detection to build this AI model. Amazon SageMaker provides every developer and data scientist with the ability to build, train, and deploy machine learning models quickly. Amazon SageMaker is a fully-managed service that covers the entire machine learning workflow which labels and prepares your data, chooses an algorithm, trains the model, tunes and optimizes it for deployment, makes predictions, and takes action.

Data First

The data required for this project consists of car images and related CSV file describing metadata information. We collected over 11,000 images from our internal image server and split them into the training set, 10,000, and the test set, 1,400.

Dataset Samples

To clean and label the images we need to:

  1. Get rid of images with car angles that do not show a rego plate, Eg. dashboard, GPS infotainment, wheel, boot, passenger seat, side, etc.
  2. Label the rest of the images with a bounding box location for the rego plate.

How do we do this job in the most efficient way without the need to manually view each image? Here is our other AI technology, Cyclops which comes to the rescue.

Cyclops Tech

Cyclops can classify car images into 27 position categories like the boot, passenger seat, side mirror, dashboard, full rear, and full front with 97.2% accuracy.

After categorizing each image, we filter out all images which do not contain a rego by excluding images not categorized as full front or rear, for example like dashboard or wheel.

In the end, we successfully split the dataset into two groups: no rego group containing 9,840 images and a with rego group containing 1,564 images. Next is the unavoidable job to manually label the 1,564 images containing regos although the workload has been already dramatically reduced.

SageMaker Details

Here comes the exciting part which is using AWS Sagemaker to train the model after the data is prepared. We picked the Object Detection Algorithm which accepts images and corresponding CSV files describing metadata such as the bounding box location of the rego in the image. After training, this algorithm will produce a model ready for inference. You can pass in an image to get back a list of predictions of rego locations along with their confidence score. Normally the prediction with the highest score is what you need.

So here are the high-level steps to use SageMaker:

#Step 1: Create Jupyter Instance

The core component in SageMaker is a Notebook instance, a compute instance/server/EC2 which only runs Jupyter which works as a fundamental IDE for data scientists.

“The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.”

Go to Notebook: Notebook instances section and create a notebook instance. In this case, we chose size ml.t2.medium, click “Start” and “Open JupyterLab” and we are set to embark the machine learning journey.

The Notebook Instances section

#Step 2: Monitor the Training

Once the code in Jupyter has run and the training is triggered, the created training job can be monitored at Training: Training jobs section. You can also get more details about that training like Input data configuration, Metrics, Output data configuration and Hyperparameters for more analysis.

Training jobs panel

#Step 3: Inference and Assess Your Model

After a few hours tweaking and waiting for training to finish, the first model is ready for assessment. You can go to Inference: Models section to check the full information about all of your created models.

Note: This is not where you physically store the model but a place describing all of the associated meta information of your model such as the training job, creation time and model data as well as the location which often is an S3 link.

Model list and details

Failed Trial 😰

However, things started to become a bit of mystery from here. The assessment result showed that:

The model did a good job to recognize rego from the images but making a ton of mistakes by saying there is a rego while actually not.

Or in other words, the recall rate was acceptable but not the precision rate. Then we tried hyperparameter tuning, data set cleaning(again) and retraining with different epochs, but none of these could save our poorly behaved model. We seemed to be getting lost in the mist.

Inference Mistakes

Something must be wrong. We calmed down, stopped the training process and came back to observe our training set and compared them against mistakes made by the model. Suddenly we realized that all of our training datasets are positive samples, which must contain a rego in the image. This could be related to the issue that the model poorly identifies when there is no rego in the image, producing a poor precision rate. Because if we only feed in images with regos, how could the machine learn that something looks like a rego is actually not??? For example an odometer counter on a dashboard. It’s basically the same thing as educating a child and teach what is good but without telling what is bad which is a very inefficient way to educate. People learn from mistakes and know what not to repeat. It should be the same as training our baby machine!

Positive and negative data sample

Thus we have a hypothesis that:

The model only learns what is a rego but doesn’t learn what is NOT a rego.

Detour Negative Sample Hindrance

After the failure, we tried to ignore the restriction and feed in images with no rego and metadata but ended up with an error reported from Sagemaker which requires all metadata have a bounding box.

So what to do? While we are banging our heads in despair, we saw a mandarin sitting in the corner of our desk. And we start thinking instead of forcing the model to detect images with no rego, why not shift its attention to detect something completely different? Like a mandarin🍊?

Sagemaker Object Detection Algorithm allows training with multiple classes hence we decided to train with two classes: Rego Plate and Mandarin. In the image when there is no rego, we put a mandarin. Now every image has a bounding box and Sagemaker is happy.

When there is no rego, there is a 🍊!

We wrote a script which put a random scaled size mandarin emoji at a random location in the images when there is no rego.

With very high hope, we restarted the training and got the new model out. We ran the test again and the assessment result was surprisingly good! What a great achievement! Both precision and recall satisfied requirements. Now, this model can not only correctly identify rego from the image and detect it’s location, but also can ignore images without rego. We successfully delivered our model and quickly created a deployable product.

The End

Machine learning and AI are fun, but the road to success won’t be flat and always joyful. While climbing the mountain, trying to remove a blocking boulder is possible but seeking a detouring path may be better. Be brave and confident about your hypothesis, run the experiment to test them and hold on when things do not go according to your plan.

At carsales we are always looking for fun and talented people to work with. If you liked what you read and are interesting in joining, please check out what positions we have available in our careers section.

--

--