How to Run Serverless Trueface: One Million Requests for $0.01
Here at Trueface, we are always looking for ways to help organizations automate or manage daily tasks using computer vision and AI, and to that end, we have a company policy to dedicate one day a week to research.
For my research day, I decided to employ my experience running a private/public cloud and in my current role as an AI engineer, to explore the best way we can run Trueface on the cloud. This space has rapidly evolved from the old days of BSD’s jails and Solaris’s Zones to the current new hot toy: serverless computing.
There are two common ways to do serverless: cloud function and cloud run, and there is very little difference between the two approaches:
Cloud Function: a single purpose function that can be triggered via events or HTTP calls. The most common usage is database triggers (e.g: alert hotel staff on VIP persona arrival) and storage triggers (e.g: detect objects on user photo upload). However, the functions have to take less than 9 minutes to perform its tasks, and it also starts a new instance upon 10 concurrent requests.
Cloud Run: is stateless containers and natively portable, there is no limitation in languages, binaries or dependencies. It is only triggered via HTTP calls (or Pubnub), and the most common usage can be any workload like data batching and ML inference. Cloud Run is also capable to run 80 concurrent requests.
Even though I am a cloud function fan, I have to experiment with Cloud Run as it is more suitable for the task at hand. The Trueface SDK is built and available as a C++ library, and we also support Python 3 as a first-class binding. I have chosen to build the web app using C++ instead of Python because of performance and it is therefore a lower cost to run. I also have picked a little library called cpp-httplib to help me.
My objective is: given two face images, I have to measure the probability of a match. This is a common scenario for access control or identity verification. Now to the code:
The web app is very simple, it listens on port 8080 to POST requests, once I receive a request to /match, I extract the face features from both images then calculate similarity and respond with match probability as a plain/text value.
Despite room for optimizations, the code above achieves the task at hand perfectly. In order to run it on Cloud Run, it has to be containerized using Docker, then pushed to Google Container Registry (just like Docker hub, but private). This service charges for storage, network egress, and vulnerability scanning, I have disabled vulnerability scanning and in my case, the network egress is free. The last variable is storage which is $0.026 per GB per month. The Trueface SDK plus the Ubuntu minimal image count for 500 MB, which means I will pay $0.01 per GB per month. Once you push the container, you see three options to deploy the container: Google Cloud Run, GKE, and GCE. When you deploy to Google Cloud Run, you can configure capacity and I have selected the minimum requirements: 1 CPU and 256 MB ram. Et voila…
To test I have launched Postman and posted two images to the newly created Serverless Trueface:
It took less than one second to upload two images from my computer (in Africa to US central) to the container and return a 391 B (half KB) response. I was curious about the costs so I went to Cloud Calculator and entered numbers I have gathered to estimate my monthly cost for one million requests:
My name is Adel Boussaken and that is how I run Serverless Trueface on Google Cloud Run for one cent a month.