Similar Images: API

Vladimir Iglovikov
5 min readSep 21, 2022

--

Short version.

I have implemented an API and python library that perform an image search. Image or URL to an image as an input and a set of URLs to similar images as an output.

API is free. You get up to 20 similar images per request. The limit is 100 requests per day. If you need more — message me at LINK.

There are 18 million images in the database. I hope I will add another 50M in the next few weeks.

  • API: LINK
  • python library: LINK
  • Demo(no need to write code): LINK. You can upload an image or search for a text. You can also click on the search results to use them as a new search request. It is a good question of how many steps you can get from something innocent to porn or at least some nudity.

Long version.

In the previous blog post, I described how I started collecting feedback for the open-source library Albumentations.AI and ended with an idea to implement an Image Search service.

The problem of image search is not new. People have worked on it for years, and many services do it.

The typical use case for a normal person: you write a blog post or create a website and need high-quality images => go to Pexels, Unsplash, or Getty Images.

ML engineers think about images differently. They need images to train neural networks.

The standard story:

  • You have 20 million images.
  • 20k of them are labeled.
  • You train a network on these 20k.
  • The network is not as accurate as you need.
  • You have a budget to label 10k.

How to pick 10k out of millions?

When we say: “The model is not accurate,” we mean: “In cases A, B, and C, the model is accurate, but on C, D and F are not.”

The model is not performing well on C, D, and F because these are hard cases, or we did not have enough examples in the train set.

For example, in Self Driving, we have a lot of data about cars, but police cars are a rare class.

But! In production, i.e., on the road, we should pay close attention to the emergency vehicles around us.

How do we identify photos with police cars?

The standard way to do it is called Active Learning.

We take our model that does not detect police cars ultra well but detects some of them.

We perform inference on the millions of unlabeled images. Pick those where the model identified police cars, but it is not confident. For example, the probability is [0.3–0.7].

We send these images for labeling, add them to the train set and retrain the model. If the resulting quality is not high enough, we repeat the procedure.

I think every ML Engineer has done something like this in the past.

The method is so widespread that we can call it “The Best Industry Practice,” although it has limitations.

  1. If you do not have enough examples of the target class, the model trained on them will be rather weak and will not find much in the unlabeled data.
  2. Inference on large volumes of data is slow and expensive. And if you do a lot of iterations, it is VERY slow and VERY expensive.
  3. The model will improve the accuracy in the cases that it was able to detect. If the model cannot catch some corner cases, it will not be able to detect them even after many iterations.

Comment: Active Learning has a sibling called Pseudo Labeling. You pick the most confident [0.9–1] predictions and add them to the train set as if they were ground truth. For sure, if your model says that there is a police car with high confidence, it does not mean that it is the case. However, still, this technique works surprisingly well, especially if you add Test Time Augmentation, post-processing, ensembles, soft labels, and other techniques from ML competitions.

There is another way.

We can define “similarity” in another way. We can define it semantically.

Here is an example of my service outputs for “woman on the bicycle.” Images are different, but you can see that all of them are relevant to the search query.

The implementation is straightforward:

  1. You take the image database.
  2. Extract embedding from every image.
  3. During the search, you compare the embedding of the target image with the database. Smaller the distance, the higher the similarity.

Every image search works exactly like this. It can be Pexels or Face ID on your iPhone.

The advantage of the method is that you need to extract embeddings only once. After this, you can look for police cars, women on bicycles, or toy pandas.

The disadvantage is that no one knows how to create good embeddings. And the difficulty is not engineering. It is scientific. The standard approach of — let’s take Resnet, EfficientNet, or InceptionNet trained on ImageNet and use it works, but not as well as we would like.

There is a competition at Kaggle where Google is looking for new ideas for that task.

Q: What do I have now?

  • API. You can perform a request with an image, URL to the image or text, and get URLs for 20 similar images.
  • Python library that wraps the API.
  • Demo where you can check the quality of the search.

Q: What is next?

Right now, it is the technology in search of the product. I need use cases.

It looks like my next steps would be to implement plugins to the data analysis and data labeling tools + blog posts/tutorials that show the value.

Although if you have better ideas, I am all attention. Feel free to message me on LinkedIn.

P.S. Russian version of the text.

--

--