Introducing Image Search & Price Suggestions
In the past seven years, Carousell has led the way in mobile classifieds and is one of the fastest growing mobile marketplaces in Southeast Asia. Currently, we are in 19 cities across 7 countries. We’re always looking at ways to leverage machine learning to enhance our users’ experience.
Today, we are excited to launch two new features to help our buyers and sellers: Image Search & Price Suggestion. In this post, we will introduce these features and share how we use AI to power them.
Shopping on Carousell has always been a visually engaging experience. Just think of all the time we spend scrolling through the feed. We know that when it comes to buying, you really have to see something before you even consider it.
The inspiration to buy can come at unpredictable times, whether you are out window shopping with friends or browsing social media. In these situations, we often resist the urge for an impulse buy and make a mental note to try searching for it later. However, when we do remember to come back to it we might find it difficult to describe what exactly we saw in words, much less search terms.
With Image Search, users now have the ability to use pictures to easily find things on Carousell. Instead of typing keywords into the search bar, simply select an image from your camera roll to search for items of similar colour, style and design.
As a two-sided marketplace, we serve buyers and sellers. While we build features that help buyers find what they want on the platform, we also make it point to solve common problems sellers face.
What we’ve found is that sellers often struggle to price their products to get the best value. Price too high and you won’t receive any chats and offers. Price too low, and you lose out on potential earnings.
To figure out an appropriate price, sellers usually search for similar listings to get a sense of market prices before actually listing their product. This can be quite tedious and time-consuming.
We introduced Price Suggestions to directly address this cumbersome process. Given the listing’s image and title, the feature uses AI and our transaction database to suggest an appropriate price for your listing based on prices of similar products that were successfully sold. We also display a set of similar listings that were recently sold below the suggested prices.
How it works
Image Search and Price Suggestions each address one side of the marketplace, be it buyers or sellers. However, both features actually share similar mechanisms under the hood.
We search for listings of similar attributes, with features extracted using AI-based computer vision and natural language processing. In the case of Price Suggestion, we also compute the price statistics on top of the retrieved results to suggest appropriate price ranges.
Learning feature embeddings
Traditionally, extracting features from images requires extensive feature engineering from computer vision algorithms. However, we have already developed a neural network with deep convolutional layers. We are able to train our model to map the raw image and the listing title into a shared high-dimensional vector space.
Given an image and a bag-of-ngrams representation of a title, our model learns embeddings for both of them, and maximises their dot-product.
The learned vector space is where semantically similar titles and visually similar images are located closer to each other. This results in implicit clusters that capture meaningful information about how listings relate to one another.
Nearest Neighbour Search
To find similar listings from a given image, we generate feature embeddings from the trained model and perform a nearest neighbour search on a collection of listings also embedded in the same the vector space. Results retrieved are then ranked by closest distance based on the feature embeddings.
However, performing an exact nearest neighbour search can lead to relatively high latency given the large collection and high dimensionality of the vectors. An additional consideration would be that loading the collection of pre-computed embeddings from disk into memory would take hundreds of seconds.
Hence, we surveyed several Approximate Nearest Neighbour (ANN) search solutions, and settled on Spotify’s Annoy library. Annoy works by building binary trees through random projections (a hyperplane split between 2 random points) recursively until there are at most K items left in each node.
In this way, Annoy drastically reduces the search latency and memory footprint while preserving precision. The indices built with Annoy can also be mapped into memory quickly, reducing load time.
At time of inference, we generate feature embeddings for a given listing, and use Annoy to perform an approximate nearest neighbour search on top of a pre-computed collection of indexed listings. The results are then surfaced to the user in their search.
Price Suggestions uses a similar mechanism as Image Search. Using a concatenated set of image and title embeddings, we found that the nearest neighbour search yielded more accurate results than simply using only image search, especially for listings in which brands cannot be reliably inferred from the image.
We compared the performance of the neural network approach to a baseline full-text search approach through an internal A/B evaluation tool. Staff members were presented a test set of listings, with two sets of results computed — one from the neural network, the other through full-text search.
We found that results from the neural network approach significantly outperformed the full-text search, and was picked nearly 5 times more often.
In the process of evaluation, we made several key observations on each approach. Generally, full-text search fell short with generic titles such as “sling bag”, which produced a variety of irrelevant listings. In contrast, the neural network was able to infer the specific type of sling bag from the image, and return results that were more similar to the original sling bag.
However, the neural network would occasionally fail to take into account certain important text features such as brands, which would heavily influence the price of an item, e.g. “Zara Striped Crop Top”.
To address the shortcomings of the neural network approach, we applied a final sort on top of the retrieved set of results based on title match. Through computing price percentiles from weighted samples biased towards results of better match in titles, we were able to further improve the accuracy of the suggested price ranges.
Image Search and Price Suggestion are two of our latest features that we are proud to introduce to our community. We hope that these machine learning powered features can enhance your experience on our marketplace. If you haven’t tried them out, give it a go and you might just surprise yourself!
Happy buying and selling!
Special thanks to Jethro, Jay, Johnny, Kan, Hoang and many other teammates who have made invaluable contributions to bring these features to live for our users.
We are always looking at ways to leverage machine learning to enhance the user experience and are currently hiring data scientists and machine learning engineers to join us in building more features like this.