How Maximo Visual Inspection empowers Image Segmentation

Annalise Sumpon
IBM Data Science in Practice
4 min readNov 15, 2023

Image segmentation (IS) is the process of locating objects and their boundaries in computer vision, more specifically, it assigns every pixel in an image to a class. IS can be broken down into two distinct types:

  1. Semantic segmentation: this type of segmentation groups together multiple objects of the same class as a single instance.
  2. Instance segmentation: this type of segmentation does not group together multiple objects of the same class, rather, it treats each unique object as an instance of that class.

Image segmentation is a dynamic method to distinguish meaningful regions under a variety of circumstances (ie lighting/positioning variations). Furthermore, it can divide images into different meaningful regions that can be used for analyses and enable tracking, monitoring and more for certain use cases.

Some of these use cases include but are not limited to: traffic medical image analysis, computer vision for autonomous vehicles, face recognition and detection, video surveillance, and satellite image analysis.

Maximo Visual Inspection, otherwise known as MVI, is IBM’s easy to use, no-code tool that enables technical and non technical users to create image classification, object detection and video detection models. Through training on large and enriched datasets, these models are able to learn complex visual patterns that difficult to detect for the human eye.

Let’s walkthrough a demonstration in the user interface of MVI where we create an object detection model to segment vehicles in images.

Step 1: Creating the dataset.

In the Maximo Visual Inspection Platform, there is a ‘Data sets’ page that serves as your all-in-one data management for your image assets. It holds your data sets in a similar structures to a file system. Each data set is used for a single model to be trained on.

By navigating into your chosen data set, images can be easily uploaded via drag and drop or mass upload from your local computer. In this case, the “Cars” data set contains several images of overhead car shots.

Step 2: Labeling the data.

Since we are building an object detection model, we must label each of the car objects present in each image. For labeling, it is important to label the entire object in the image. For best practices, it is recommended to label consistently across all images to ensure a better performing model. Below details the steps to create those annotations.

Step 3 (Optional): Augmenting and auto-labeling.

In MVI you can easily augment your dataset to expand and enrich the training. Augmenting options include: blurring, horizontal and vertical rotations, coloring, cropping and more. Furthermore, you can use an already deployed object detection model to auto-label data in your new training set to save significant time in labeling.

Step 4: Training the model.

In MVI, you can train your model by selecting the “Train model” button within your chosen data set. In the “Train model” page, select the “Object detection” option. Since segmentation is a subset of object detection, there are optimized models that are suited for traditional object detection and segmentation.

In our case, you’ll choose the “Detectron2” model. This model is best for small objects and tight polygon-shaped objects. You are able to update specific parameters, such as iteration size and your training and testing splits in the “Advanced settings” toggle.

Once you are satisfied with your selections, you can begin the training process by clicking “Train model” button. You will be redirected to the live training process, which will detail the total time for training and performance across the current training iteration.

Step 5: Deploying the model — using if for inference.

Once your model is trained, you will have the option to deploy the model for further inferencing. To deploy the model, on the trained model page, you can select the “Deploy model” button and navigate to the “Deployed models” page. Select your specified model and view the inferencing page.

Below are examples of inferenced images with the resulting bounding polygons for each object.

In this first example, we are shown a non-noisy demonstration of isolating cards on a highway. The first images demonstrates the optimized segmentation around the cars on the highway. We can see the tight boundaries around each object, which reflects the polygon-labeled training data.

Another example below, shows a busier road example where the overall image is noisier and has many objects, including people, trees and more. The MVI model is able to identify and distinguish the different orientations and boundaries of the cars in this busy intersection.

You’ve now successfully taken a dataset of images to a fully deployed model by leveraging IBM’s Maximo Visual Inspection.

Modeling is an iterative approach, so now you can improve your models by adding more labeled images and inferencing for insights. Now you can work on improving your current models and exploring other forms of image segmentation!

This article discussed the user-interface of Maximo Visual Inspection, but it can be accessed programmatically using a REST API for further batch modeling and inferencing. For more information on developer tools and tutorials for Maximo Visual Inspection, visit here. If you want to learn more about the Maximo Application Suite and other related tools, visit here.

Originally published at https://medium.com on November 15, 2023.

--

--