Deep image understanding at Carousell

Matt Henderson
Oct 31, 2017 · 3 min read

In the past five years, Carousell has led the way in mobile classifieds and is one of the fastest growing mobile marketplaces in Southeast Asia. We’re in 19 cities across 7 countries. We’re looking at ways to leverage machine learning to enhance the user experience.

At Carousell, my team develops machine learning features that help our users list, sell, and buy items more easily. We train our models on Carousell’s sizeable internal datasets of items for sale and user interactions. Our first feature powered by machine learning suggests titles and categories for your listing, based on the images that you upload. This is available in Singapore on the Android app, and is in the process of rolling out on iOS and in other countries.

Image for post
Image for post
Suggested categories and titles based on the image. The network’s third suggestion “Yamaha Keyboard” is correct.

We train deep convolutional neural networks on our database of tens of millions of listings, to classify images into their categories. This classifier is used to provide category suggestions in the app.

However, treating title prediction as a categorisation task like this would not work well, as there are so many different titles in our data. Instead, we trained a ranking model that takes an image and attempts to select the correct title out of a pool of candidate titles.

The neural network for ranking titles has two halves. One half looks at the image using deep convolutional layers; the other looks at potential titles, processing the words and phrases using embeddings and a deep neural structure.

The two halves map images and titles to a shared high-dimensional vector space, and vector similarity is then used for ranking.

Our network is learned jointly from scratch with a single ranking loss function. This structure allows for a lot of pre-computation in training and inference.

When a new image is uploaded to Carousell, the model ranks a list of titles derived from millions of listings to find good suggestions in under 100 milliseconds.

Image for post
Image for post
The shared image and title space learned by the deep neural network. The network has learned to put images and their corresponding titles nearby. It has learned implicit clusters like clothes, games and electronics. It still makes some mistakes, for example it put the title “IKEA cushion” too close to the image of the Hermes handbag, and it did not learn to identify the “Sketch Drawing” with high confidence. The high-dimensional space is projected down to 2 dimensions for the visualisation.
Image for post
Image for post
The difference between the deep vector representation of the red phone case and the grey one gives a semantic ‘red’ direction in the vector space. Adding the red vector to other images allows us to ‘turn them red’.

We train our models across multiple GPU machines in parallel for hundreds of millions of steps (but keep training time down to a couple of days to allow for quick development).

Our best network is a joint model that predicts the category and ranks titles using a shared deep representation of the image.

Image for post
Image for post
A larger sample of the vector space learned by the network, showing only images. Some well-defined clusters include women’s shoes at the top, clothes at the bottom, and mobile phones to the left.

If you’re already using these new machine-learning powered features on our marketplace, thank you for trying them. If you haven’t yet, we hope you’ll give it a try soon.

We expect the real learning to happen from your interaction with the features, i.e. which suggestions you click on, and as more and more people use it.

We are currently hiring data scientists and machine learning engineers to join us in building more features like this.

Carousell Insider

What's going on under the hood at one of the world's…

Matt Henderson

Written by

machine learning for natural language understanding

Carousell Insider

What's going on under the hood at one of the world's largest and fastest growing classifieds marketplace. We're on a mission to inspire everyone in the world to start selling.

Matt Henderson

Written by

machine learning for natural language understanding

Carousell Insider

What's going on under the hood at one of the world's largest and fastest growing classifieds marketplace. We're on a mission to inspire everyone in the world to start selling.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store