Building a Real-World Pipeline for Image Classification — Part I
One of the most important things of a classified website is its images. It’s probably the first interaction of your customer with your product.
Most-likely, they are part of your landing page, where users spend most of their time on.
UX is one of our corner stones at heycar. Therefore, we look forward to the best possible experience for our users.
This is how our search-results page is supposed to look like:
As mentioned on a previous article, at heycar we are hard bound to the market that we’re included. Thus, limited from easily requiring our data providers for raw images of cars. In reality, given the images we receive, the car-tile looks more like this:
As you can see, there are multiple issues that hurt our core values. Let us dive a bit more into each.
Banners
The API which we receive data from our providers was created for a market where dealerships compete with end-users trying to sell their cars.
The majority of the dealerships feels the need to highlight remarks of their cars, as well as to provide “brand-trust” of their dealership networks over cars that are sold from end-users (people selling their own cars).
Inspite of the visual noise, as you can see on the example images before, and in the case above, the banners are a way the dealers found to send unstructured data. e.g: energy consumption, monthly prices, insurance, guarantee, and anything that can somehow grab the users attention.
Most, if not all, of those atributes are already supported by our APIs. Thus, we can and receive the structured data to display it properly.
By getting rid of the banners we hope to reduce the distractions and provide a fair baseline of comparison to our users.
Besides, it also allows us to use the raw “listing” information everywhere, since there are no attachments to brands of dealerships. e.g.: images on paid social ads.
Position & Order
Another issue, it’s the conformity, the order and position of the images.
Our brains are addicted to patterns. Therefore, is more pleasant to provide an according experience. Besides, knowing the position also help us to use the semantic information to both improve the UX as well as to score and rank listings, e.g.: listings that provide us at least 1 picture of each part of the car.
What to do?
Ultimately, we need to understand the context of every image on our platform in order to have structured data to deal with those issues in an elegant way.
Manual Classification
The obvious way would be to have people manually tagging the images as banner, no banner, front, interior, … After all, we are really good at cognitive pattern recognition. Yet, it’s not that easy…
Despite the fact that it would impact a lot on time-to-market of our listings, the problem with manual detection is that it wouldn’t scale for the amount for images that we have. Roughly:
500k cars * ~12 images per car = 6M images
That’s only the start, we would have a daily deltas load to classify too, about 5–10% of our inventory changes everyday.
There was the idea of using a third party tool like Amazons’ Mechanical Turk, yet, it goes back to the time-to-market. Since we can’t control how long it would take for the images to be tagged. e.g.: An impact of a couple of hours difference from our competitors can be crucial for lead generation, since our users would take longer to receive the data compared to our competitors’ users.
We also haven’t even covered the cost of manually classifying those. So, manual classification was not feasible.
Automatic classification
With the manual solution out of the way, we started investigating ways of automating the tagging of the images.
Our brains make vision seem easy. After years of exposure and learning, it doesn’t take any effort for us to tell apart a car and a truck, read a sign, or recognize a face. These are actually hard problems to solve with a computer: they only seem easy because our brains are incredibly good at understanding images.
Machine Learning can help us with that, is a solution that can work, but it requires research and time to develop both the detection models as well as the infrastructure for making sure it runs fast enough and can keep up with the constant demand of images.
The basic idea would be to figure a way of building an image classification model with Convolutional Neural Networks, and for our benefit Google has built a lot of open-source tools on that end, like Inception:
The use of Inception was an intuitive one, take something that is already built in this case a well established neural network optimised for image recognition tasks, and retrain it with our images. This is known as transfer learning, and for us it proved to be a time and cost effective way to quickly implement an image classifier.
We quickly discovered a downside to the inception model, in our image classification pipeline we found ourselves dealing with a classification bottleneck on a model that was unnecessarily heavy for this task. We needed a home-grown solution.
Considering the tooling was considerably easy to experiment, we have decided to give a try on TensorFlow and we have built a proof of concept.
Proof of concept
First, we have decided to implement something quite small, but that can bring value for our users, as a proof of concept.
The idea was to create a model that identified a banner on the image, or if the image is/has a banner. Thus, enabling us to filter them in order to find one main image of the car for the search page results to look more like the mock-up.
For that, we have manually gathered approximately a thousand images for each “class”. Yes, we went through our data and kept copying images to folders until we had “enough” of them for the first try.
Here is a video explaining the whole idea in depth:
As the video shows, it is easy to start and get fairly good results. Tensorflow’s developers say that we could use about 100 images of each class. However, on our experience that was not suitable for production usage, where we have to cover a wider range of images. Of course, it varies depending on context.
Either way, our goal was to prove that it was possible to use it, and it was.
Moving forward
Once the concept has been proved, we acquired trust that the technology would be an enabler, that it would scale to our throughput and precision expectations.
The next step would be to split efforts in two parts: creating a strong model and building infrastructure to classify, store, and serve the classification data.
Building the model — we need data?
As mentioned, the amount of images for our use-case was bigger than we first thought. Thus, we had to gather a reasonable amount of manually labeled images to improve the model’s accuracy against our wide inventory.
We have started with a “banner/no-banner” simple front-end application that would read from a database of images, show to a user and as for a manual classification.
Unfortunately, I couldn’t find screenshots of that one, but only for the subsequent update which introduced the concept of positioning.
Either way, the concept is the same, collecting manual labeled data.
We ran this application across the whole company, asking people to classify images from our inventory according to the rules we’ve stipulated on a document.
Even though, there were clearly mistakes, so we advise you to use some sort of consensus logic around the final conclusion of a manually labeled tag. e.g.: “if at least 5 people classified this as an ‘engine’, then it is an engine”.
Just in case, in order to avoid pollution on your models’ classes data.
Building the model — brick and mortar
Now that we have our dataset of images it’s a matter of putting it to good use. Our transfer learning attempt with the Inception model was a little heavy so we decided to make our own model using our favourite framework… Keras.
We will cover the creation of this model in more detail in another post (coming soon), but the end result was a small, efficient model capable of classifying images containing banners and those that don’t.
Here is an example of the models’ results, how the image of a car is seen by the model after extensive training an tweaking:
More tips on building the model will be presented on the sequel focused article.
Building the Infrastructure
While our model was being prepared, on the Platform Engineering side, we had to create infrastructure to support thousands of images being processed every minute. Our partners aren’t easy on us when it comes to sending data.
There were several challenges along the way, and the draft below covers only the first Production implementation:
We receive data from our providers, that goes into our normal ingestion process. The relevant part for this process, the images, are constantly reporting changes in the “image-stream”, where at the moment we use AWS Kinesis.
The image-classification-worker is an internal piece of code that gets new images, triggers the classification on Tensorflow-Serving, caches it and post data into another database, for consuption.
Among the challenges, we have storage & caching of classification data, fan-out, real-timeliness/impact, error-reporting and of course, budget.
You will learn more about how we have been dealing with those on a sequel article. We’ll explain in detail how we’ve implemented the architecture above, the tricks and limitations and how we evolved that to what we have now, spoiler: it grow a lot.
First Results
Enough of implementation, let’s check our first results in Production.
After creating a strong model and building the infrastructure, we have started rolling out the models to production. Initially, with partners integrations feeds, which can’t have banners due to legal reasons.
Here is an example of a car-tile, with a much better user experience, banner free!
The results were not perfect, but they were quite satisfactory.
Our rule was to filter our images until we would find the first “banner-free”. That gives us sometimes weird looking “first-images” of a car. e.g.:
It is indeed the first banner-free image of that listing, however, it’s not the ideal one to be used on the integration-feed, neither on our own web-site. Therefore, our next step was to work on the positioning of the car. Combined, both attributes can create quite pretty home-pages.
We don’t know yet if it will be possible to block, hide or even down-rank images based on their attributes, due to legal reasons, nevertheless we know that having this information will come handy soon enough.
Impact on numbers
As mentioned before, we didn’t release this widely so far, but this is a quote from marketing:
“we’ve started the first ad-campaigns on Facebook with banner-free images, it is tremendous success: Leads increased by ~500% last week” — Marketing Dep.
As soon as we have more results we’ll update this post as well, we have been running A/B tests on our website with the banner/no-banner images.
Thanks for reading this far, if you liked the whole concept you can dive deeply into each topic by checking their individual articles. The links fot the articles will be available here as soon as they get published.
By the way, if you want to work with infrastructure, machine-learning or any related topics, take a look at our careers page.
** Update 14th of November **
Jonathan Greve and I have been to the Predictive Analytics World conference in Berlin this year, talking about the same topic. Here are the slides:
Further readings and resources used for the proof of concept are available here:
Open Source
Things that we’ve used for this project:
- http://github.com/tensorflow/tensorflow — Train Classification Models
- http://github.com/tensorflow/serving — Serve Trained Models
- https://github.com/hey-car/tensorflow-model-server — Our docker-image for tensorflow serving
- https://keras.io — Framework for Deep-Learning
Readings
Interesting links and articles related to image-classification and Tensorflow: