Best Practices for Custom Models in Watson Visual Recognition

Kevin Gong
IBM watsonx Assistant
5 min readNov 13, 2017

Since the launch of the Watson Visual Recognition API, we’ve seen users help California save water, perform infrastructure inspections with drones, and even find Pokemon. Powering many of these use cases are custom classifiers, a feature within Visual Recognition that allows users to train Watson on almost any visual content.

To create custom classifiers, users define categories they want to identify and upload example images for those categories. For example, a user wishing to identify different dog breeds may create 4 classes (golden retrievers, huskies, dalmatians, and beagles) and upload training images for each class. You can find this exact example in the Watson Visual Recognition demo or explore other tutorials on custom classifiers.

Custom classifiers can be highly powerful but require careful training and content considerations to be properly optimized. Through our user conversations, we’ve assembled a best practices guide below to help you get the most out of your custom classifiers.

How training can increase Watson Visual Recognition’s quality

The accuracy you will see from your custom classifier depends directly on the quality of the training you perform. Clients in the past who closely controlled their training processes have observed greater than 98% accuracy for their use cases. Accuracy — different from confidence score — is based on a ground truth for a particular classification problem and particular data set.

“Clients who closely control their image training processes observed greater than 98% accuracy

As a best practice, clients often create a ground truth to benchmark against human classification. Note that often humans make mistakes in classifications due to fatigue, reputation, carelessness, or other problems of the human condition.

On a basic level, images in training and testing sets should resemble each other. Significant visual differences between training and testing groups will result in poor performance results.

There are a number of additional factors that will impact the quality of your training beyond the resolution of your images. Lighting, angle, focus, color, shape, distance from subject, and presence of other objects in the image will all impact your training. Please note that Watson takes a holistic approach when being trained on each image. While it will evaluate all of the elements listed above, it cannot be tasked to exclusively consider a specific element.

The API will accept as few as 10 images per class, but we strongly recommend using a significantly greater amount of images to improve the performance and accuracy of your classifier. 100+ images per class is usually a good starting point to get more robust levels of accuracy.

What is the score that I see for each tag?

Each returned tag will include a confidence score between 0 and 1. This number does not represent a percentage of accuracy, but instead indicates Watson’s confidence in the returned classification based on the training data for that classifier. The API will classify for all classes in the classifier, but you can adjust the threshold to only return results above a certain confidence score.

The custom classifier scores can be compared to one another to compare likelihoods, but they should be viewed as something that is compared to the cost/benefit of being right or wrong, and then a threshold for action needs to be chosen. Be aware that the nature of these numbers may change as we make changes to our system, and we will communicate these changes as they occur.

Further details about scores can be found here.

Examples of difficult use cases

While Watson Visual Recognition is highly flexible, there have been a number of recurring use case that we’ve seen the API either struggle on or require significant pre/post-work from the user.

  • Face Recognition: Visual Recognition is capable of face detection (detecting the presence of faces) not face recognition (identifying individuals).
  • Detecting details: Occasionally, users want to classify an image based on a small section of an image or details scattered within an image. Because Watson analyzes the entire image when training, it may struggle on classifications that depend on small details. Some users have adopted the strategy of breaking the image into pieces or zooming into relevant parts of an image. See this guide for image pre-processing techniques.
  • Emotion: Emotion classification (whether facial emotion or contextual emotion) is not a feature currently supported by Visual Recognition. Some users have attempted to do this through custom classifiers, but this is an edge case and we cannot estimate the accuracy of this type of training.

Examples of good and bad training images

GOOD: The following images were utilized for training and testing by our partner OmniEarth. This demonstrates good training since images in training and testing sets should resemble each other in regards to angle, lighting, distance, size of subject, etc. See the case study OmniEarth: Combating drought with IBM Watson cognitive capabilities for more details.

Training images:

Testing image:

BAD: The following images demonstrate bad training since the training image shows a close-up shot of a single apple while the testing image shows a large group of apples taken from a distance with other visual items introduced (baskets, sign, etc). It’s entirely possible that Watson may fail to classify the test image as ‘apples,’ especially if another class in the classifier contains training images of a large group of round objects (such as peaches, oranges ,etc).

Training image:

Testing image:

BAD: The following images demonstrate bad training since the training image shows a close-up shot of a single sofa in a well-lit, studio-like setting while the testing image show a sofa that is partially cut off, farther away, and situated among many other objects in a real world setting. Watson may not be able to properly classify the test image due to the number of other objects cluttering the scene.

Training image:

Testing image:

Need help or have questions?

We’re excited to see what you build with Watson Visual Recognition, and we’re happy to help you along the way. Try the custom classifiers feature, share any questions or comments you have on our developerWorks forums, and start building with Watson for free today.

Originally published at www.ibm.com on October 24, 2016.

--

--

Kevin Gong
IBM watsonx Assistant

Product manager @IBMWatson. Photographer. UX/UI designer. DIYer. Data tinkerer. Social good supporter. Formerly @McKinsey, @TEDx, @Cal, @ColumbiaSIPA