Move your ML from GKE to Cloud Run! (or don’t?) — Part 2

Doreen Sacker
4 min readNov 23, 2022

--

Furry monster sitting on a cloud reading a paper on a light blue background | DALL·E 2

Héctor Otero Mediero and I migrated our machine learning services from Google Kubernetes Engine (GKE) to Cloud Run. In Part 1, you can read about our architecture before and after the migration and what to look out for.

On Cloud Run, the services and machine learning models scale up nicely if more requests come in. However, with the setup of Part 1, Cloud Run was more expensive than the Kubernetes Cluster before. Below you can see the architecture of our services on Cloud Run after the first iteration of the migration.

As you can see, the recommendation service in the middle is orchestrating all other services and therefore many recommender container instances were running but were essentially doing nothing except waiting.

How we improved our services

In order to find the bottlenecks, first we added more tracing to our services. Fortunately, Google has a nice way to add traces and measure how long certain actions take.

Redis/Memorystore

It turned out that the way we access Redis was very inefficient. Per request we need on average 2000 items from Redis, getting them one by one, will take way longer than getting them all at the same time.

VPC Connector

Memorystore is protected from the internet using VPC networks and the services have to connect through VPC connectors. Looking into the connectors metrics we realised that the traffic going through the connector was more than it could handle, so we exchanged it for a bigger instance. This change had a noticeable performance impact.

Pub/Sub

A Pub/Sub queue has quite a few parameters to configure. Reducing the time of Message retention duration, the Acknowledgement deadline and lowering the retries in the Retry policy will result in having fewer requests in the queue and therefore less load on the services.

Fast & expensive recommendations

The changes described above lead to scalable and fast recommendations. We reduced the average recommendation time from 10 seconds to under 1 second. However, the number of Cloud Run instances was still unnecessarily high and the costs went up by a factor of 4 compared to the Kubernetes cluster setup.

Event Choreography

In order to reduce costs, we needed to rethink our architecture. The recommendation service was orchestrating all other services, waiting for their answers. This is called the Orchestration Pattern. To have the services work independently, we changed to an Event Choreography Pattern.

Leverage native GCS events using EventArc

We need the services to process their tasks one after the other. We changed the initial entry point to be the Article Scraper rather than the recommender service. As soon as the article is scraped, the text is saved in Google Cloud Storage (GCS).

EventArc lets you asynchronously deliver events from Google services. Once the object including the text is uploaded, EventArc will react as a trigger and will send the object information to a Pub/Sub queue. The encoder service is called by the queue and does its processing and once done saves it to GCS again. One service after the other can now process and save the intermediate steps in GCS for the next service to reuse.

By decoupling the services and reducing wait time, each service only does what it is supposed to. We added GCS to the architecture, however, the cost is negligible. After switching to the new architecture, our recommendations are fast, scalable, and cost half as much as the cluster setup in the beginning.

Summary

  • We managed to reduce the cost by 50% compared to the original setup.
  • We managed to improve the time to process a recommendation from 20 seconds to under 1 second.
  • Migrating from K8s to CloudRun is easy for already dockerized applications.
  • It is better suited for small and asynchronous processes.
  • Make sure your services are fast and do only their dedicated work without waiting around.
  • Performance issues will become very apparent when using Cloud Run together with a push-subscription of Pub/Sub
  • You may need to tweak existing services.
  • Cloud Run is under very active development and new features are being added frequently, which makes it a nice developer experience overall.

--

--

Doreen Sacker

Transforming the meaning of words and sentences into numbers is fascinating to me, that is why NLP is my favourite field.