Stacking Multiple Custom Models in Watson Visual Recognition

Kevin Gong
IBM watsonx Assistant
4 min readDec 11, 2017

This story is part of a best practice series on Watson Visual Recognition. You can find the previous entry here and get started with Watson Visual Recognition through IBM Cloud.

The custom model within Watson Visual Recognition is one of the API service’s most popular functionalities, allowing users to train Watson to recognize virtually any custom content. We’ve already seen customers use custom models to combat drought in California, analyze unstructured social media data, and inspect cell towers using drones.

While the majority of use cases we see only need to utilize a single custom model, a nifty technique to extend the capabilities of custom models even further is to layer multiple models. With some programming from the user’s side, it’s possible to take an image that is classified through one custom model and feed it into a second custom model based on the results from the first model. This can continue indefinitely (in theory) and ultimately provide the user with a set of highly specific tag results for a single image.

Let’s go through a car insurance claim use case to see how this technique can work for a real-world business application.

Layer 1 — Type of Vehicle Damage

In this example use case, our goal is to help a car insurance company automatically generate quotes from images of car damage. We’ll assume that the insurance company has already gathered additional information regarding the car, such as model, vehicle registration number, etc.

Image an insurance company receives that they must generate a quote for. (Source)

Let’s say the insurance company receives the image above. The first step would be to identify the type of vehicle damage, so the insurance company passes this image through a custom model that is trained to recognize a few different types of vehicle damage (see diagram below). This custom model, which we designate as the first layer of this workflow, identifies the type of damage as broken windshield.

Like training any custom model, the insurance company gathers example images of each of the classes it wants to identify and trains the service on them. You can find an interactive version of this particular custom model in our demo here, and we have a best practices guide on training available here.

Layer 2 — Severity of Damage

Now that the type of damage has been identified, the insurance company programs an interaction that checks the result from the first custom model (broken windshield) and passes the image into a second custom model that’s specifically trained to asses the severity of the windshield damage.

For this particular example, the insurance company trains the second custom model to recognize three severity levels of windshield damage:

  • Light: small chip or crack
  • Medium: larger spiderweb of cracks
  • Heavy: significant portions of windshield missing
Image sources: 1, 2, 3

This second model in particular allows the insurance company to achieve the goal of providing an estimate for the cost of the damage. Each severity level can be tied to a monetary range that is then returned to the user. Since the original image is categorized as light damage, the insurance company might send the user a message saying, “The cost of repairs to your windshield will be $70-$150.”

More Models, More Flexibility

So, why not just put everything into a single custom model instead of spread across multiple models?

There’s a few technical reasons as to why this might not work well. First, a custom model returns only one tag result for any given image (whichever class gives the highest confidence score). A user looking for multiple custom tags for an image will need to run the image through multiple custom models. Second, each custom model should have no more than 8–10 classes. As the number of classes within a custom model exceed these levels, the accuracy of the model will suffer due to increased noise.

By appropriately splitting training data across multiple custom models, users not only gain a greater number of tags, they’ll also be able to swap models as needed. In the auto insurance example we walked through, a car insurance company that wants to explore car brand or color of vehicle or other factors as part of their second custom model can simply swap in the appropriate model instead of needing to retrain a single model multiple times. It’s this level of flexibility that makes Watson Visual Recognition a powerful tool in solving challenges across many industries.

Questions? Comments? Working on your own Watson projects? Let us know in the comments below!

--

--

Kevin Gong
IBM watsonx Assistant

Product manager @IBMWatson. Photographer. UX/UI designer. DIYer. Data tinkerer. Social good supporter. Formerly @McKinsey, @TEDx, @Cal, @ColumbiaSIPA