How you can optimize your CPU and GPU utilization

Image for post
Image for post

At GumGum, we use Computer Vision (CV) to leverage page visuals for our contextual targeting and brand suitability product called Verity. We process millions of images every hour, and at this rate, our long-term inference costs dwarf the upfront training costs. So, we tackled this issue head-on. In this post, I’ll benchmark and highlight the importance of multi-threading for I/O operations and batch processing for inference. Note that implementing these strategies may be an overkill if your application’s scale is of the order of a few thousand images an hour.

Bottlenecks in a Typical Inference Workflow

Let’s look at our application components:

Image for post
Image for post
A typical workflow in CV inference applications.

API. The API provides an interface between the client and the CV module. Minimally, a client request contains an image url and a task, e.g. “check whether the image depicts violence”. Here, we assume that the API performs at the desired scale and is not a bottleneck in our application. …


Rashad Moarref

Software Engineer with entrepreneurial spirit. Passionate about building Machine Learning applications at scale. PhD in ECE, Univ. Minnesota. Caltech Alumnus.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store