Move your ML from GKE to Cloud Run! (or don’t?) — Part 1

5 min readNov 23, 2022

Furry monster sitting on a cloud reading the paper on a light blue background | DALL·E 2

Recently, we migrated our machine learning services from Google Kubernetes Engine (GKE) to Cloud Run. I’ll give you a brief overview of what we did, what our architecture looks like, and how the migration went. The goal of this article is to share our learnings and describe why the first iteration did not have the desired outcome.

In Part 2 we’ll describe how we managed to finally reduce the cost by 50% compared to the original setup and additionally improved the time to process a recommendation from 20 seconds to under 1 second.

System Architecture

Working for Opinary, we engage readers by asking the right questions in news articles. As data scientists, Héctor Otero Mediero and I are responsible for integrating the best contextually fitting poll into news articles.

Imagine reading an article on your favourite news site about whether there should be a speed limit on the motorway. We aim to integrate a poll within this article asking you something like the following:

Poll asking do you think speed limits on motorways are necessary?

Recommending and integrating this question in this article, that’s a success. But what is actually happening in the background? There are many services involved in displaying this poll in your news article. The recommendation services, which we’re focusing on here are displayed below:

The recommendation system starts with an article URL, which a poll should be recommended for. Let’s take an article with the title:

Petition for speed limit and car-free Sundays comes to the parliament

The article URL is pushed into a Pub/Sub queue, which is a messaging-oriented queue to process incoming tasks. The recommender service pulls a new task from the queue in order to process it. Find out more about pub/sub here.

The recommendation service will then orchestrate the other services by asking them in the following order, one after the other:

The Article-Scraper-Service to scrape the article text.
The Encoder-Service to encode the text into text embeddings.
The Brand-Safety-Service to classify if the text is safe to integrate into and doesn’t include descriptions of tragic events, such as death, murder, or accidents.

Afterwards, with all the gathered information the recommendation service will recommend a poll for the article of interest. In the example above, it will recommend for the article, Petition for speed limit and car-free Sundays comes to the parliament, a poll asking about the speed limit.

Before the migration, all services and a Redis database were running in a Kubernetes Cluster.

Why did we want to migrate?

Our main goal when we decided to migrate was saving costs. We only wanted to pay for the resources we used. With GKE the cluster and the services are there all day and night and you pay all the time. Since we have a lot less traffic during the night in Europe, we wanted the services to scale down when not needed and scale up when a lot of requests are coming into the system. Lastly, our team has no dedicated Data Engineer, so the solution should be as easily maintainable as possible. CloudRun seems like a good solution for our use case.

What is this Cloud Run?

Cloud Run is a managed serverless platform by Google. Serverless means that the cloud provider allocates machine resources on your demand, taking care of managing the servers. Cloud Run is easily usable and makes it possible to deploy a Docker application in seconds. It scales out of the box according to incoming HTTP requests and you only pay-per-usage.

Requirements to consider before moving

Since Cloud Run services scale with incoming HTTP requests, every service must have an HTTP endpoint to take requests. It’s possible to have an easy setup with Pub/Sub push subscriptions to process the requests. In addition, to quickly scale up and down it is beneficial if the services have a quick start-up time.

New Architecture — What changed?

The most crucial change was the switch from a pull-subscription to a push-subscription. The recommender service when done with one task used to pull a new task from the pub/sub queue. Now the queue pushes the work forward to the recommender whenever it comes in and a recommender container on Cloud Run has to respond in a certain timeframe.

Further, we’re using VPC connectors to connect the Cloud Run services to a VPC network. The services and redis are not anymore running all together in a protected cluster. The services are now independent Cloud Run services and we moved from a Redis inside the GKE to Memorystore, a GCP in-memory service for Redis. Memorystore is protected from the internet using VPC networks and the services need to connect through VPC connectors to access the store. Read more about the setup here.

How it went (wrong)?

After the migration, we have a nicely scalable system that automatically scales up if more requests come in. However, the Recommendation service was getting VERY expensive, because it was orchestrating and therefore waiting for all other services. As a result, many recommender container instances were running but were essentially doing nothing except waiting. Therefore, the pay-per-use policy of Cloud Run leads to high costs for this service.

This is not the end of the story, only the first iteration. How we made the system work nicely with Cloud Run and which improvements we tackled on the way, you can read in Part 2.

Summary

Migrating from K8s to CloudRun is easy, however, consider the following:

Our costs first increased by migrating to Cloud Run, read part 2.
It is better suited for small/asynchronous processes.
You may need to tweak existing services.
Problems in performance will have a higher impact.