The most important thing about a technology is how it changes people’s lives — Unknown author

Building a Real-World Pipeline for Image Classification — Part I

Marcelo Boeira
heycar
Published in
10 min readSep 26, 2018

--

One of the most important things of a classified website is its images. It’s probably the first interaction of your customer with your product.

Most-likely, they are part of your landing page, where users spend most of their time on.

UX is one of our corner stones at heycar. Therefore, we look forward to the best possible experience for our users.

This is how our search-results page is supposed to look like:

How our search-results page design should look like — in regards to images

As mentioned on a previous article, at heycar we are hard bound to the market that we’re included. Thus, limited from easily requiring our data providers for raw images of cars. In reality, given the images we receive, the car-tile looks more like this:

Real images on our search-results page — Highlighted in Red: UX issues

As you can see, there are multiple issues that hurt our core values. Let us dive a bit more into each.

Banners

The API which we receive data from our providers was created for a market where dealerships compete with end-users trying to sell their cars.

The majority of the dealerships feels the need to highlight remarks of their cars, as well as to provide “brand-trust” of their dealership networks over cars that are sold from end-users (people selling their own cars).

banner examples — unstructured information e.g.: price, financing rates, energy consumption, dealership branding…

Inspite of the visual noise, as you can see on the example images before, and in the case above, the banners are a way the dealers found to send unstructured data. e.g: energy consumption, monthly prices, insurance, guarantee, and anything that can somehow grab the users attention.

Most, if not all, of those atributes are already supported by our APIs. Thus, we can and receive the structured data to display it properly.

By getting rid of the banners we hope to reduce the distractions and provide a fair baseline of comparison to our users.

Besides, it also allows us to use the raw “listing” information everywhere, since there are no attachments to brands of dealerships. e.g.: images on paid social ads.

Position & Order

Another issue, it’s the conformity, the order and position of the images.

Our brains are addicted to patterns. Therefore, is more pleasant to provide an according experience. Besides, knowing the position also help us to use the semantic information to both improve the UX as well as to score and rank listings, e.g.: listings that provide us at least 1 picture of each part of the car.

What to do?

Ultimately, we need to understand the context of every image on our platform in order to have structured data to deal with those issues in an elegant way.

Manual Classification

The obvious way would be to have people manually tagging the images as banner, no banner, front, interior, … After all, we are really good at cognitive pattern recognition. Yet, it’s not that easy…

Despite the fact that it would impact a lot on time-to-market of our listings, the problem with manual detection is that it wouldn’t scale for the amount for images that we have. Roughly:

500k cars * ~12 images per car = 6M images

That’s only the start, we would have a daily deltas load to classify too, about 5–10% of our inventory changes everyday.

There was the idea of using a third party tool like Amazons’ Mechanical Turk, yet, it goes back to the time-to-market. Since we can’t control how long it would take for the images to be tagged. e.g.: An impact of a couple of hours difference from our competitors can be crucial for lead generation, since our users would take longer to receive the data compared to our competitors’ users.

We also haven’t even covered the cost of manually classifying those. So, manual classification was not feasible.

Automatic classification

With the manual solution out of the way, we started investigating ways of automating the tagging of the images.

Our brains make vision seem easy. After years of exposure and learning, it doesn’t take any effort for us to tell apart a car and a truck, read a sign, or recognize a face. These are actually hard problems to solve with a computer: they only seem easy because our brains are incredibly good at understanding images.

Machine Learning can help us with that, is a solution that can work, but it requires research and time to develop both the detection models as well as the infrastructure for making sure it runs fast enough and can keep up with the constant demand of images.

The basic idea would be to figure a way of building an image classification model with Convolutional Neural Networks, and for our benefit Google has built a lot of open-source tools on that end, like Inception:

Inception V3

The use of Inception was an intuitive one, take something that is already built in this case a well established neural network optimised for image recognition tasks, and retrain it with our images. This is known as transfer learning, and for us it proved to be a time and cost effective way to quickly implement an image classifier.

We quickly discovered a downside to the inception model, in our image classification pipeline we found ourselves dealing with a classification bottleneck on a model that was unnecessarily heavy for this task. We needed a home-grown solution.

Considering the tooling was considerably easy to experiment, we have decided to give a try on TensorFlow and we have built a proof of concept.

Proof of concept

First, we have decided to implement something quite small, but that can bring value for our users, as a proof of concept.

The idea was to create a model that identified a banner on the image, or if the image is/has a banner. Thus, enabling us to filter them in order to find one main image of the car for the search page results to look more like the mock-up.

For that, we have manually gathered approximately a thousand images for each “class”. Yes, we went through our data and kept copying images to folders until we had “enough” of them for the first try.

Here is a video explaining the whole idea in depth:

Tensorflow Image-Classificaton Example

As the video shows, it is easy to start and get fairly good results. Tensorflow’s developers say that we could use about 100 images of each class. However, on our experience that was not suitable for production usage, where we have to cover a wider range of images. Of course, it varies depending on context.

Either way, our goal was to prove that it was possible to use it, and it was.

Moving forward

Once the concept has been proved, we acquired trust that the technology would be an enabler, that it would scale to our throughput and precision expectations.

The next step would be to split efforts in two parts: creating a strong model and building infrastructure to classify, store, and serve the classification data.

Building the model — we need data?

As mentioned, the amount of images for our use-case was bigger than we first thought. Thus, we had to gather a reasonable amount of manually labeled images to improve the model’s accuracy against our wide inventory.

mobile-version of the image-classification app

We have started with a “banner/no-banner” simple front-end application that would read from a database of images, show to a user and as for a manual classification.

Unfortunately, I couldn’t find screenshots of that one, but only for the subsequent update which introduced the concept of positioning.

Either way, the concept is the same, collecting manual labeled data.

We ran this application across the whole company, asking people to classify images from our inventory according to the rules we’ve stipulated on a document.

Even though, there were clearly mistakes, so we advise you to use some sort of consensus logic around the final conclusion of a manually labeled tag. e.g.: “if at least 5 people classified this as an ‘engine’, then it is an engine”.

Just in case, in order to avoid pollution on your models’ classes data.

Building the model — brick and mortar

Now that we have our dataset of images it’s a matter of putting it to good use. Our transfer learning attempt with the Inception model was a little heavy so we decided to make our own model using our favourite framework… Keras.

Confusion matrix of predictions

We will cover the creation of this model in more detail in another post (coming soon), but the end result was a small, efficient model capable of classifying images containing banners and those that don’t.

Here is an example of the models’ results, how the image of a car is seen by the model after extensive training an tweaking:

Example banner classification neurons — illustration purposes

More tips on building the model will be presented on the sequel focused article.

Building the Infrastructure

While our model was being prepared, on the Platform Engineering side, we had to create infrastructure to support thousands of images being processed every minute. Our partners aren’t easy on us when it comes to sending data.

There were several challenges along the way, and the draft below covers only the first Production implementation:

Draft of the Architecture 1.0 — Simplified*

We receive data from our providers, that goes into our normal ingestion process. The relevant part for this process, the images, are constantly reporting changes in the “image-stream”, where at the moment we use AWS Kinesis.

The image-classification-worker is an internal piece of code that gets new images, triggers the classification on Tensorflow-Serving, caches it and post data into another database, for consuption.

Among the challenges, we have storage & caching of classification data, fan-out, real-timeliness/impact, error-reporting and of course, budget.

You will learn more about how we have been dealing with those on a sequel article. We’ll explain in detail how we’ve implemented the architecture above, the tricks and limitations and how we evolved that to what we have now, spoiler: it grow a lot.

First Results

Enough of implementation, let’s check our first results in Production.

After creating a strong model and building the infrastructure, we have started rolling out the models to production. Initially, with partners integrations feeds, which can’t have banners due to legal reasons.

Here is an example of a car-tile, with a much better user experience, banner free!

Partner-integration feed — no more banners

The results were not perfect, but they were quite satisfactory.

Our rule was to filter our images until we would find the first “banner-free”. That gives us sometimes weird looking “first-images” of a car. e.g.:

Example of a non-ideal highlight-image of a car — partner feed

It is indeed the first banner-free image of that listing, however, it’s not the ideal one to be used on the integration-feed, neither on our own web-site. Therefore, our next step was to work on the positioning of the car. Combined, both attributes can create quite pretty home-pages.

We don’t know yet if it will be possible to block, hide or even down-rank images based on their attributes, due to legal reasons, nevertheless we know that having this information will come handy soon enough.

Impact on numbers

As mentioned before, we didn’t release this widely so far, but this is a quote from marketing:

“we’ve started the first ad-campaigns on Facebook with banner-free images, it is tremendous success: Leads increased by ~500% last week” — Marketing Dep.

As soon as we have more results we’ll update this post as well, we have been running A/B tests on our website with the banner/no-banner images.

Thanks for reading this far, if you liked the whole concept you can dive deeply into each topic by checking their individual articles. The links fot the articles will be available here as soon as they get published.

By the way, if you want to work with infrastructure, machine-learning or any related topics, take a look at our careers page.

--

--