How to setup an image recognition task properly? — 2018 Update

The article is also published at Ximilar AI Blog.

Are you junior to image recognition? These are best tricks & tips for you or your developers building image recognition solution with a small dataset.

Machine learning is still young but it becomes highly available every day. Apple Vision SDK, Google Tensorflow or Vize image classifier make it easy to train custom recognition models. While running a vision platform I get many questions and we deal with limitations of technology. This post is for people who are building image classifier to help them define the task to get the most out of today’s technology.


I had a conversation with a client who said that google vision does not work and it returns some non-relevant tags. He employed few students to do the monkey job. The problem was not the API but the approach to it. After showing him our custom approach and sharing some tips we were able to start testing image classification in 10 minutes.

The Basic Rules

  • Binary classification striped/not striped get 50–100 img/label
  • Up to 20 labels for hard to recognize labels get ~100 img/label
  • Up to 100 labels for well-defined labels get 100–200 img/label
  • Pattern recognition structures, x-ray images get 50–100 img/label
  • Abstract labels up to 20 categories get ~100 img/label

What does not Work

  • Multiple labels with small dataset — over 20 labels need at least 100+ images per label to achieve solid results.

Reliability of the Results

Every client is looking for reliability which is equal to accuracy. Stay simple if you aim to reach high accuracy. Technology is still pretty dumb. Building an image classifier with a limited number of training images needs an iterative approach at this moment. I recommend to follow rules below.

  • Break your task into simple decision (yes or no)
  • Make categories smaller & connect them in some logical manner
  • Use general models for general categories
  • Each label should have similar amount of images
  • Always collect images to extend your dataset
  • Merge very close classes together
  • Use UI/human feedback to improve the data
  • Maintain quality of your dataset

Testing & Production Difference

We allow users of to train tasks with a minimum of 20 images per label. By dividing your data to training & test set, Vize uses training set for learning optimal parameters for the classifier. We augument these images, during the training, in several ways to extend the set of images by automation. Test set is used for computing the accuracy of the classifier — the accuracy which you can see in Vize app on the Task screen.

Remember 20 images per label is the lower end of needed images with usually the worst results and low accuracy. 20 images might be enough for your testing, but not for production. Most of the time the accuracy in Vize can be pretty high, easily over 80 %, for small datasets. However it is common in machine learning that for more stable and reliable result in production, you should use more images. Some tasks needs hundreds even thousands of images per label for good performance of your production model.

Best Practises

Start with Fewer Categories

Building an app for people to recognise shoes I recommend to start with ~50 shoe types. This is easy to train task with 100 images of each shoe. Let users add and upload new shoe in the user interface. Also, let them give you feedback for your classifications. This way you can get an amazing dataset of real images in one month and then update your app.

Use Tasks with Less Categories

Building a classifier for plane types with small training dataset, separate your images into “in the air” and “on the ground” images. Build two different models for air and ground and get better overall results for both. You can even merge similar planes to one class and train another recogniser to sort them out. Once you have more images you can merge these categories together.

Use Binary Classifiers for Important Classes

Creating captions for images in e-commerce? Build custom task for each tag. One model will classify “rounded” “not rounded” etc. This way you get very reliable specialised classifier for each tag.

Don’t Mix the Input Images

Machine learning performs better if the distribution of training and evaluated pictures is the same. This means you need to have same images for training as the ones you are going to evaluate. You can hack this, using internet images in the beginning but you should start gathering user imagery as soon as possible. These rulers are going to make your model robust in the future.


Building image classifier is not only hard in a matter of good deep learning task but also good task definition and good dataset. If the size of the dataset is challenging, start simple and iterate towards your goal. If you have any questions feel free to text me or comment below.

Try Vize for free at



Official blog of We write articles about image recognition, deep learning and artificial intelligence. At Vize we help businesses to extract actionable value from their images.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Víťa Válka

User interface designer who convinced his family to switch from a house to a travel trailer. #digitalnomad