Navigating memory issues faced during running machine learning models on Edge devices.

Published in

Eumentis

3 min readMay 3, 2024

Out of memory. (credits — Adobe Firefly)

In this article, we’ll talk about the challenges we faced regarding memory usage while running machine learning models on Edge devices.

Our task was to build an object detection model for detecting very small objects within an image. We trained the model on our custom dataset using the YOLO framework. However, we encountered a significant challenge as the model’s accuracy turned out to be unacceptably low. To enhance accuracy, we introduced a tiling process, dividing each image into 640x640-pixel tiles. For instance, a 4800x4800-sized image was divided into 49 tiles. To reduce processing time and the number of tiles to be processed, we implemented a custom logic that reduced the tiles to 35. Therefore, 35 tiles of 640x640 were processed for a single image. Each tile was processed independently, and the results were combined for the entire image. This approach led to improvements in both processing time and accuracy, and it worked effectively on web devices.

Our next challenge was to implement the same process on a mobile device. After replicating the tiling process in a React Native app, we observed that the app crashed after processing four to five tiles. Profiling the app with the help of Android Studio revealed that the available memory kept decreasing, eventually leading to the crash. Ideally, after processing each tile, the memory should have been freed up. The memory spiked and reached around 7 GB (I had an 8GB RAM Device)after four to five tiles leading to the crash.

Android Studio’s App Profiler helped us identify the memory spike, with the Profiler displaying the memory used after processing each tile in the memory section.

To identify the point at which the memory spiked, we commented the code line by line and checked when the memory issue occurred. Since our main operations involved cropping the image, creating a blank image, and overlaying images on top of each other, we commented out these lines one by one, but it did not yield any positive results. Tweaking the code in areas where we felt it could be better optimized did not help either. We conducted numerous trials to reduce memory usage or free up memory, but to no avail.

Ultimately, changing the JavaScript engine to Hermes proved to be a valuable solution to address the issue. Hermes demonstrated improved memory management, allowing the app to process four to five images without crashing. Although the app was now capable of processing four to five images, this was still unacceptable, given that our use case involved the possibility of users capturing 30–40 images in a single session.

We followed the same approach to identify the line causing the memory spike. This time, we discovered that the memory spike occurred when creating a blank white image using tensors. In our pipeline, we were creating a blank white image from tensors for an overlaying operation, where the original image would be placed on the blank white image. To address this issue, we implemented a workaround. Instead of creating a blank image for each image individually, we bundled a single blank image with the app and cropped it to the required size dynamically. This change successfully resolved the memory issue, enabling the app to process around 50 images without crashing.

Profiling the app using Android Studio was instrumental in visualizing memory usage and making informed decisions.

I hope you have gained some valuable insights. Thank you for your time and patience!!!

Navigating memory issues faced during running machine learning models on Edge devices.

Written by Madhur Zanwar