Comparison of Web and Mobile classification models based on inference results

Published in

Eumentis

4 min readFeb 8, 2024

This post is the fourth and last in a series of articles on building an image classification model for Edge inference. The first three articles of the series can be found here:

In an ideal scenario, there should be no difference between the inference results if the pre-processing steps are identical in both the web and mobile code. However, in our testing, we observed variations even when the pre-processing steps were the same. Since we were aiming for no difference between the two models, we set out to investigate the cause of this difference.

We had ensured our pre-processing steps were identical but for some images, we were getting different class predictions on web and mobile. We trained several models on the web aiming for higher validation accuracy. We ported these models and tested them on mobile. We hoped that a highly accurate web model would counter the inefficiencies of the conversion process. But this wasn’t the case. We still had images for which the results were different.

Then we set out to investigate the input to the model — the image tensor.

We saved the tensors generated during pre-processing
We output the tensor to a text file in row and column format for all three channels.
Then we copy pasted those values into an Excel sheet and applied some conditional formatting based on element values.

Below is what we got after the above steps for web and mobile. It was evident that the input to the models was not identical.

Image drawn from the tensor being passed to the web device

Image drawn from the tensor being passed to the mobile device

The images above highlight a clear difference between the input to the model and explain the difference in results for some images. We still didn’t know why was this tensor different. So, the focus was shifted to evaluating each pre-processing step one by one. We found that for some pre-processing steps, we had, due to certain limitations, used different frameworks. The variation in the frameworks was the ultimate culprit.

Let’s take image resizing, for example. In our web implementation, we utilized OpenCV’s cv2.resize function to resize the images. On mobile, we employed the PlayTorch library’s torchvision.transforms.resize for the same operation. We found that both of these functions follow a different approach to resizing an image resulting in slightly different pixel values at the same positions in the resized image. In complicated classification cases, this can lead to differences in web and mobile models’ results. We updated the pre-processing steps to counter this and ultimately had an inference pipeline for mobile inference which produced similar results when compared to the web. This also increased our confidence in our model porting strategy — that it had not decreased the model quality.

Below are the results for the web and mobile inference.

Accepted: Input images that were of good quality and were input to the next model in the pipeline.
Rejected: Images that were of poor quality and shouldn’t have been sent to the next model.
Rejection rate: Images that were good but wrongly rejected.

As we can see below, both our models produced nearly identical results.

In this series, we’ve covered the entire process of training and porting an image classifier model to mobile devices. Testing and comparison of both models also led us to amazing new insights. It also made us confident that our model porting strategy didn’t result in a significant drop in model performance.
Thank you for your patience and we hope you’ve found this blog informative and valuable. If you have any questions or would like to explore further, please feel free to reach out. Your feedback and engagement are greatly appreciated.

Comparison of Web and Mobile classification models based on inference results

Written by Madhur Zanwar