Model Monitoring for Imaging Datasets

4 min readSep 6, 2022

DKube provides an MLOps platform based on Kubeflow and MLFlow to build and deploy models on Kubernetes clusters. Deployed models will degrade over time for a variety of reasons such as variance in input data. Alternately, you may want to enhance the model performance based on the business outcome of the live predictions in production. Model monitoring is essential to measure a model’s key performance indicators and to understand any degradation in the performance metrics.

In this blog, we look at how DKube analyzes model performance for image data. The focus areas include input data drift monitoring and performance monitoring.

The Algorithm

Our approach does not make any assumptions about the model. We only require access to the training data used to build the model and the prediction data from the model deployment. DKube uses KServ available as an external add-on to Kubeflow for deployment and collects prediction data automatically via CloudEvent logs. The deployment doesn’t necessarily need to run on DKube. The Training data and Prediction data can be specified via the DKube Dataset interface which provides a connection handle to the data.

When the training data is fed, DKube uses an untrained autoencoder to encode each train data image into a single dimensional vector (vector size will be equal to the image width) which is later compared with similarly encoded prediction data as explained below.

An autoencoder is composed of encoder and decoder components that learn to regenerate the given input. The encoder part of the autoencoder is used for dimensionality reduction. The encoder is untrained because it doesn’t learn to reconstruct the normal train data. Once the monitor is in Active state, the following steps are performed periodically at configured intervals.

Fetches new data from Prediction dataset. The new data is the incremental data seen since it was last processed.
Runs AutoEncoder to reduce the dimensionality of the data.
Runs the specified drift algorithm to calculate the drift metric.

DKube supports Kolmogorov-Smirnov, Wasserstein distance and Jensen & Shannon algorithms for drift detection.

Kolmogorov-Smirnov: The algorithm applies 2-sample KS tests for each feature. In the case of images, the obtained p-values for each feature are aggregated. The KS test is a nonparametric test of the equality of 1-D probability distributions of train and predict samples.
Wasserstein distance: This distance is also known as the Earth Mover’s distance, as it can be seen as the minimum amount of “work” required to transform u into v, where u & v are two sample vectors. This is the default algorithm used in the product to measure the distance between two probability distributions over a given region.
Jensen Shanon (JS): It uses the KL (Kullback–Leibler) divergence to calculate a symmetric normalized score. This makes JS divergence score more useful and easier to interpret as it provides scores between 0 (identical distributions) and 1 (maximally different distributions) while using log base 2.

For visualization of drift, we plot Eigen images of both train data and predict data samples which are calculated using Singular Value Decomposition. Here we show the eigen image with highest eigenvalues to visualize the dataset. We show the eigen image as an image heatmap and also as image histogram.

When ground truth data is available and specified via the DKube Dataset interface, the above pipeline also calculates the model performance at each interval. The pipeline computes specific metrics depending on whether the model is a regression or classification model.

One could specify soft & hard thresholds for drift and performance metrics to generate alerts and email notifications.

Monitoring in Action

The example described here uses Chest X-Ray images as input data. A Tensorflow model is trained and deployed on the DKube cluster. A data generation script is used to send prediction requests to the deployment. Drift is introduced in the images by way of adding noise and rotating images 90 degrees. Clean and noisy images are posted for prediction at different intervals. Soft and Hard thresholds are specified and alerts are configured for specific metrics. The prediction requests are captured using CloudEvent logs.

The example here uses KS algorithm for drift detection and shows that a higher drift is detected when images are rotated. The calculated drift is the p-value generated from KS. The lower the p-value, the greater the drift.

The following image plotting drift at different intervals shows that the drift is average when noise is introduced.

The following image plots different metrics when ground truth is available. The model is a classification problem for pneumonia detection. The dotted lines represent soft and hard thresholds.

Thanks for reading. We will dive into more model monitoring features offered by DKube in future articles. You can read more about DKube at https://dkube.io

Model Monitoring for Imaging Datasets

Written by Subrahmanyam Ongole