Automated Image Classifier System at Scale

Published in

trillo-platform

7 min readFeb 4, 2021

Introduction

Image Classification is a foundational component of almost all modern Computer Vision applications. Image classification involves classifying images with labels or tags based on Image pixels alone without any additional metadata or information provided to the system. It has several use cases in medical image processing, security, and surveillance, entertainment, video processing, large-scale information searching, and retrieval to name a few. There have been major breakthroughs in Image Classification in recent years using Artificial Intelligence, especially through the use of Deep Learning-based techniques. However, to build a comprehensive and scalable image classification system, we not only need advanced Deep Learning models trained on large data sets, but also an end-to-end system engineered for scalability that could serve input requests in both real-time and in batches. Furthermore, the system also needs to retrain based on new data periodically to improve its accuracy over time, as image data collected under different and varied circumstances arrived.

Types of Image Classification Problems

Image Classification can be categorized into three main problem types:

Binary Classification

This is a basic form of image classification where we have only two classes. Each provided image needs to be classified into one of the two classes. No image can have both classes assigned to it. For example, by analyzing the image of a medical CT-Scan or X-Ray of a human lung, we could classify it as having pneumonia or no-Pneumonia. For training a Deep Learning model for this application, each image needs to be labeled as having Pneumonia or no-Pneumonia. In other words, we need to assign a binary label to each image.

2. Multi-Label Classification with Single Class Per Image

In this case, we have more than two possible classes. However, each image could only have one class in a single image. For example, if we have multiple possible objects of interest in an image such as human, dog, cat, car, bicycle, etc. but we know that each image could only have one of those. An example scenario or use case for this type of application could be the images coming from a CCTV source scanning an aircraft door. Suppose, we are interested in identifying whether the door is properly latched, opened, half-open, etc. We can label images of aircraft doors with one of these possible labels and then train a model to classify a new given image as one of those. Another example could be a face-based identification system for entry/exit of a building. There we would have each employee as a class and the person standing in front of the camera for identification would belong to only one of them. Basically, this is a generalization of binary classification. Training a Deep Learning model in this case requires labeling each image with one of the many possible labels. The output of the model at prediction time is the probability distribution across all classes, for example, the probabilities of having a human, dog, and car each separately which should sum to one. We take the highest probability to label the image at the prediction time.

3. Multi-Label Classification with Multiple Classes Per Image

This is the most advanced form of image classification where we have multiple classes and each image could contain multiple of them at the same time. For example, an image may contain a human, a dog, and a car and we need to identify all of them. An aircraft CCTV camera might need to identify whether the door is open or half-open etc. and also if a human or luggage is present in the scene as well. Training for this type of model requires labeling each image with multiple labels. This is usually done through a technique called one-hot encoding where we take a vector or array of labels and put ones at indexes of the array corresponding to the labels present in an image and zeros everywhere else. The output of the model is an independent probability of occurrence of each class in the image. The probabilities need not sum to one in this case. This basically means for example, that the probability of occurrence of a human does not depend in any way on the occurrence of a dog in the same image. Therefore, a probability of 0.7 of a dog present in an image and the probability of 0.8 of a human, and 0.1 of a car means that it is highly likely that the image contains a dog and a human but not a car.

An End-to-End Image Classifier Solution

A solution to the general image classification problem that handles all three types of Image Classification types and scenarios requires careful design and implementation. We need to think about data sources, data output, training and retraining of the models, and also how to scale the prediction application that deploys the classification models.

Trillo has created a comprehensive, end-to-end Image Classification solution that runs on the Google Cloud Platform. It can scale horizontally with the increase in traffic and can handle all types of classifications described above. It supports both batch and single image classification. An architectural overview of Trillo’s Image Classification System is as follows:

Solution Components

The solution consists of the following components:

Trillo-Workbench

This is Trillo’s flagship service creation and orchestration engine running on top of GCP Compute services and utilizes several other GCP services such as Cloud Storage, CloudSQL, etc. It orchestrates the flow of the whole application and invokes multiple back-end Microservices running as part of the Trillo Workbench for GCP.

2. Image labeling Tool

This is a utility tool provided by the Trillo Work-bench to label and re-label a set of images. This can be used to correct wrongly classified images by the prediction service. It can also be used to label initially unlabeled data. It has a Web-based User Interface and is a very handy tool that supports simple check-boxes for labeling small numbers of classes to text auto-completion for large numbers of labels. It is used in conjunction with model training to improve and maintain model accuracy for images captured under varying conditions and environments.

3. Microservice Front-end

This is the main Web-service exposing REST-API for classification. It routes the request to the specific back-end service for further processing. It can take input images that need to be classified in two modes:

Batch Mode
Single-Image Mode

3.1 Batch Mode

In batch mode, we give multiple images stored in GCP Cloud Storage to the service in bulk. The service takes the following inputs in JSON based request payloads:

Path to an input CSV file stored in GCP Cloud Storage. This contains the actual paths to image files (also stored in GCP Cloud Storage) that need to be classified.
Path to GCP Cloud Storage where the output CSV file is stored that would contain the path to each image and the labels predicted by the service for that image. The JSON format is also supported in addition to CSV for bulk output.
The service downloads the images from Google Cloud Storage, runs the prediction model on each of them asynchronously, generates the predicted labels in a CSV or JSON file, and finally uploads the CSV file to a Google Cloud Storage bucket.

3.2 Single-Image Mode

In single-image mode, the service takes an actual image file sent as part of the HTTP payload as a multipart MIME attachment for classification. The service invokes the prediction model on the image and generates a JSON based response containing the predicted labels.

Image Classifier Prediction Service

This is the main service that loads the Image Classifier model from Google Cloud Storage, invokes the model on each given image in prediction (inference) mode, generates labels in CSV or JSON format, and gives the output back to the Microservice front-end. Model weights are stored in Google Cloud Storage and downloaded from there once when the service starts and loaded in memory. This service runs in containers managed by Google Kubernetes Engine. It also optionally uses GPUs in the underlying Google Compute Engine running the containers for maximum speed.

Image Classifier Training Service

This is the main training service that is invoked periodically on new data and also on predicted images whose labels have been corrected by the user using the Trillo labeling tool. This way the model is improved over time and its accuracy is maintained by re-training on new and wrongly classified data. This service runs on one or more dedicated Google Compute Engine VMs with attached GPUs.

Advanced Model Optimization

Trillo Image Classifier system is an enterprise-class solution that supports several advanced model optimization techniques to improve model accuracy. It can be fine-tuned with hyperparameter optimization, advanced data augmentation, self-supervised feature extraction of the Convolutional Neural Network (CNN) backbone as a pre-task before actual downstream classifier training, and also Generative Adversarial Networks based optimization for improving the feature extractor backbone. Some of these techniques such as basic hyper-parameter tuning such as batch size and image size, learning rate, etc. are available in the basic version available in the GCP marketplace. However, more advanced features such as complex data augmentation, self-supervised feature extraction, and GAN based enhancements are available for customized solutions for our enterprise customers. These enhancements may significantly improve model accuracy and performance under a variety of conditions while building highly customized solutions.

Conclusion

Image Classification is an important foundational component and a key building block of any Computer Vision system. However, a solution to the general image classification problem that handles different types of Image Classification types and scenarios requires careful design and implementation. We need an end-to-end system that can serve predictions in both batch and single image mode, periodic retraining of the model to fine-tune it on images captured under varied conditions, a bulk labeling tool to assist in data labeling, and methods to scale the prediction for large scale enterprise-level deployments. Trillo Image Classifier solution provides all of these features and many advanced techniques for improving model accuracy and performance. It is available through the GCP marketplace for evaluation and deployment in many common use cases and can be highly customized and fine-tuned to build custom models in enterprise settings.

Automated Image Classifier System at Scale

Written by Saqib Awan