Load Balancing Blitz — data pipeline

Published in

Google Cloud - Community

6 min readJul 29, 2024

Tl;DR: We built a near real-time data pipeline in GCP for a game!

In the realm of game development, data-driven decision making is paramount. Understanding player behavior and engagement is crucial for optimizing game design and enhancing user experience. In this blog, let’s explore how we built a near real-time data pipeline to gather metrics for a demo game: Load Balancing Blitz. You will learn about:

How we built data flow pipeline for the game
How we used Pub/Sub BigQuery Subscription to simplify our pipeline
How we built a simple metrics UI using Looker in a few minutes

If you haven’t read the introductory post, check out: Load Balancing Blitz — Introduction

So, what happens during the game?

Load Balancing Blitz is a single player demo game between a player and a Google Cloud Load Balancer (GCLB) to test who handles the VM traffic more efficiently.

When the player hits the start button, the backend generates an equal measure of simulated HTTP traffic (series of continuous messages) to be passed to the player and GCLB. Out of four pre-established VMs, the player has to manually decide which one should process the incoming message versus the GCLB which does this allocation automatically. The player has physical whack-a-mole type switches to select the VM that should process a particular incoming message.

Metrics pipeline

For the Load Balancing Blitz game, various metrics were captured throughout the game session. For the entirety of a single game, which defaults to 60 seconds, real-time metrics are fetched from the backend and displayed on the dashboard. These metrics determine the leaderboard scores and also provide a detailed comparative insight into the player’s game.

What kind of data is collected? How is the data collected?

Initially, two dashboard mocks (as shown below) were created to visualize the game metrics requirements.

Dashboard 1: Player vs Load Balancer stats

Dashboard 2: Leaderboard for overall game stats

And then, it was time to think about how and from where the data would be ingested into the pipeline.

The backend system produced three logically grouped event messages, such as:

Game start/ stop: When the player first starts the game, the player and the game session information is passed to the backend. This is further condensed and passed to the data pipeline.
New incoming message: When the player chooses a VM to send the incoming traffic/message to, this creates a new message event to be passed to the backend. Once the VM has finished processing the message, the backend sends the VM information and the score for the corresponding message to the data pipeline.
VM health check: The backend also constantly polls the VM for its health status retrieving information such as CPU utilization, memory utilization etc,.

Based on the above context, the requirements for the metrics pipeline are:

Rapid data ingestion and processing: Data should be fetched from the backend, processed and written to a database within milliseconds to ensure minimal latency.
Real-time Dashboard updates: The data from the database should be fetched, organized and displayed in a dashboard in real-time or near real-time.

Choosing the right tools for the job

Pub/Sub is GCP’s real-time messaging service. BigQuery is a data warehouse and Looker helps build customizable dashboards and reports from the data.

The pipeline is as simple as: Publish events or messages to Pub/Sub, and send them to BigQuery. Use Looker to periodically fetch the new data from BigQuery and update the dashboards.

GCP provides an amazing out of the box template to wire together Pub/Sub and BigQuery, thus saving the necessity to do it manually. All it takes is a couple configuration steps and your Pub/Sub to BigQuery is good to go.

What are the other technologies considered?

Firestore: Firestore, a real-time database, was considered for its great data synchronization with UI (< 30 milliseconds), but was not ultimately chosen for this implementation. BigQuery was a better fit since the generated data had a well-defined schema, and had to be used in complicated queries and Looker.

Dataflow: Dataflow was considered for data transformations but was deemed unnecessary due to the absence of transformation requirements in the Load Balancing Blitz game.

Where is the data ingested & transformed?

The three event messages were set up as three different topics in Pub/Sub along with their appropriate Pub/Sub subscription. Three BigQuery tables were created with schema to represent the different event data types.

The pipeline was completed by setting the Export to BigQuery configuration. This creates a push subscription that writes messages directly to a BigQuery table. Recommended for scenarios that require low latency and where messages do not need additional data transformation.

This Pub/Sub to BigQuery workflow runs continuously on the backend. When the Pub/Sub subscription receives a message, it is then immediately posted to the appropriate BigQuery table. There is no official posted latency for the Pub/Sub to BigQuery subscription and the latency may vary due to various reasons (different zones, complex Dataflow scripts, etc.,). But in this game we observed, Pub/ Sub to BigQuery table write, to be in the order of milliseconds.

What happens to corrupted or ill-formatted messages?

To avoid issues with message formats, “Dead lettering” can be set up in Pub/Sub which will post all non-standard or non-schema-compliant messages to a separate BigQuery table to be reviewed later.

Caution should also be taken to ensure that the Pub/Sub Subscription does not have an “inactive expiration date” configured. We experienced a scenario where the Subscription was set to be automatically deleted after 31 days of inactivity. This deletion disrupted the workflow, necessitating the recreation of the Subscriptions with the expiration set to “Never.” Once this adjustment was made, the pipeline resumed normal operation and ran smoothly.

Looker dashboard

Configuring Looker with BigQuery is as easy as choosing the latter as the data source in Looker.

All 3 BigQuery tables were imported into Looker as the data source. The “Freshness” timer was set for 1 minute which means, the BigQuery tables will be polled once every minute. The dashboard was created according to the mocks, and a few computationally inexpensive grouping operations were defined as part of the Looker dashboard.

Looker provides capabilities to create calculated fields or dynamic fields which are computed during runtime based on the dataset. Few operations like aggregation are better suited for these tasks. However, if the data transformation is frequent or complicated, then DataFlow or a custom transformation pipeline (with Cloud Functions) is better suited for the job.

The Looker dashboard had a Freshness timer of 60 seconds, meaning it will refresh the dashboard at most once in every 60 seconds. This was great for the use case, but if you needed absolute real-time information, then building a UI to fetch data directly from BigQuery is a better option.

More resources

Curious to know more? Here’s the source code for Load Balancing Blitz and an introductory post.

To learn more about load balancing on Google Cloud check out: Choose a load balancer. If you are interested in learning about Pub/Sub to BigQuery, see BigQuery subscriptions.

Learn the product fundamentals at:

> How to create PubSub Topic/Subscriptions

> How to create BigQuery Tables

> What is Looker?