Firebase Realtime Database — scale up using shards and load balancing

Published in

Geek Culture

5 min readJul 17, 2021

Firebase Realtime Database is a great fully managed, low latency and zero-configuration database for real-time applications. It allows users to connect, write and receive updates in real-time using the Firebase sdk, and it’s also really easy to work with as a developer. If you’re not familiar with Firebase Realtime Database, I encourage you to read more, starting from here.

Over the last few years, the maximum load capacity that a database can handle has been increased. Nevertheless, in some situations, a single database is not enough: either you already have a very large user base or you are very confident about the success of your product that you want to plan for scalability since day one. We know, we’ve been there. The Firebase team comes to the rescue (as always) and suggests using the sharding technique.

Realtime database sharding

Database sharding allows you to distribute the load across multiple instances of Realtime Database, essentially doubling the capacity using 2 instances and so on. You can use up to 1000 instances at the same time, thus reaching incredible scalability. The Firebase documentation suggests two different types of sharding techniques: by customer or by data type. In both situations, the data should be segregated by design and not overlap. What if you are developing a collaborative platform and your data is segregated at a lower level but the amount of connections changes a lot during the time?

Use case

Recently, we developed an online real-time card game, for CI/CD workshop designed by Eficode: Pipeline — The Game that Delivers!. Every user in the same game can move cards (pipeline steps) in a board to create a complete pipeline and then set an estimation on each step. In an ideal world, the database scales up to cope with the current traffic in the application and, for example, adds a new Firebase Realtime database when the current instances reach a given load percentage (assuming that one game will never need more than one database otherwise your data will not be segregated anymore). The system will then redirect inbound traffic generated by new games to the less crowded database with some sort of load balancing. But the Realtime Database does not have this feature yet, you need to manually add a new instance to the system and direct the traffic at the application level.

The desired result

The approach described in this article will allow you to have a semi-automatic scaling system, and easily increase the traffic that your application can handle, adding Realtime instances that can also have Google Cloud Functions triggers.

The strategy

We use Firestore as a long-lived, non real-time database, to count all active connections on the Realtime instances. The stale games, e.g. the ones without any activity in the last 24 hours, are removed from the RTD and copied back into Firestore. When a new game is created, or a stale one is accessed, the system assigns the least crowded Realtime instance and moves the data into it, redirecting the users to the selected instance.

How to keep track of the number of connected users?

Firebase documentation states it clearly: we can use the special path “.info/connected” and listen to its value to check the connection status. So when a user joins, a new connection reference is created (that way we can also manage multiple connections of the same user, like multiple tabs open simultaneously on different games). The connection is removed when the user disconnects. The great thing about the onDisconnect method is that the operation is saved and executed in the server, which means that, if the client closes the page

or even the device, the action is still handled.

Create a new connection object on connect and remove on disconnect

A Google Cloud Function with triggers on connection changes will then keep the number of connected users in a single RTDB instance.

Ok, now we are able to decide which database is the most suitable for a new game (the least crowded) and whether we need to create a new instance to scale up.

A few problems

First of all, there is no automatic way to create a new database instance and scale up your system, even if there are official APIs to manage RTDB instances (that have been also extended to other regions recently). Even if you set up some sort of fancy automation to create a new instance according to monitoring alert of usage (or something similar) you have to face the biggest problem: Firebase functions with triggers on Realtime database must be explicitly linked to all the instances, which means that you need to add the new instance into the code and redeploy the functions to attach triggers to the newly created instance.

CLI API and pre-deploy script to the rescue

Firebase CLI allows us to list all instances of the Realtime Database available, so we use it in a pre-deploy script to inject into the function's code (unorthodox?🤔🤔) all the URLs necessary to attach the triggers.

Add the instances of rtdb inside the functions at deploy time

After that, an utility exports a given cloud function with realtime trigger to all the available instances dynamically:

and can be used to export a function on all the available instances in this way:

Exports the function trigger on all the instances

Putting all together

To sum up:

Firestore keeps a shared counter for each database instance containing the live count of open connections (you can also use cloud monitoring API if the connections do not change rapidly since it adds some latency in metrics visualization)
A game is dynamically assigned to an RTDB instance when needed
When a game is stale it is removed from the Realtime database and put back to Firestore (this means that a game is not assigned once and for all to an instance)

To scale up your system:

create a new database instance using API or Firebase console
rerun deploy pipeline that will dynamically attach function triggers to the newly created instance

This approach allowed us to build a semi-automatic fully scalable platform for real-time collaboration with the compromises described above. It would be great to have an API in the functions package to attach to all instances and automatically do it also on newly created instances. This will remove the need to redeploy your platform, making the scaling process smoother.

You can have a look at the entire code on github.

Thank you for reading my article. If you have any questions, some feedback about the approach, or you want to suggest a better way to reach the same result, feel free to write here or reach me.