Firebase: Migrating data to Cloud Firestore using Admin SDK
Last fall Google introduced Cloud Firestore — a highly scalable database service that supports expressive queries and realtime listeners. If you are an app developer familiar with the Firebase Realtime Database, and if you’re interested in trying out Cloud Firestore, you might be looking for a way to migrate your existing data from Realtime Database to Firestore. This post sheds some light on this topic, and demonstrates how to implement a custom data migration script for Firestore using Firebase Admin SDK.
First thing we need to remind ourselves is that Realtime Database and Firestore are based on two very different data models. The Realtime Database stores data in a JSON tree. It’s simple, and great for storing anything that can be encoded as JSON. In contrast, Cloud Firestore is a document-oriented database. It stores data in documents, which are grouped into collections. Documents are comprised of key-value pairs, and they can store virtually anything — including raw bytes. Moreover, each document is allowed to have sub-collections, which adds a notion of hierarchy to the Firestore data model.
Due to the differences in data models, there is no one universal tool for migrating data from Realtime Database to Firestore. Developers attempting such a migration should analyze the schema of their existing data, and map the JSON tree in Realtime Database to a set of collections and documents in Firestore. Once this mapping has been determined, Firebase Admin SDK can be used to implement the actual data migration. Admin SDK supports both database services. Therefore we can use it to implement an ETL job that extracts data from one database, and loads to the other.
Lets consider an example. Suppose the Realtime Database contains following data from a hypothetical chat application.
In Firestore, we can save each chat room as a document in a collection named
rooms. These documents can store high level room metadata, as well as the membership information. Chat messages can be stored in a
messages sub-collection created under each of the room documents. This essentially results in a Firestore schema that is amenable to the same queries our chat application would have made using the Firebase Realtime Database.
Now lets implement our data migration script. I’m going to use the Firebase Python Admin SDK for this task. The Python Admin SDK interacts with the Realtime Database via REST (as opposed to WebSockets). This makes it particularly suitable for bulk downloading data from the Realtime Database. Listing 2 shows the resulting code with the helper functions removed. Full source code including all the utilities is available here.
We start by initializing the Admin SDK with a service account credential. Once initialized we can access both database services programatically. The
db module of the Admin SDK enables us to extract existing data from the Realtime Database as Python dictionaries. Then we perform the necessary transformations in memory, and load the resulting values to Firestore. When writing, we group up to 500 writes into the same batch operation to minimize the number of RPC calls made (see the full implementation). To better enable this, we perform the Firestore upload in two phases. First we write all the chat rooms, 500 rooms at a time. Then we write all the messages, 500 at a time.
A custom script such as listing 2 can easily migrate small to medium sized datasets. However, it assumes the dataset can fit in memory. There’s also a 256 MB limit on the amount of data that can be fetched from the Realtime Database in a single read operation. Therefore when dealing with large datasets, consider exporting the data into a JSON file from the Firebase Console. Then implement a script to stream from the exported file, and incrementally upload data to Firestore using the Admin SDK. You can take the same approach to migrate your other existing datasets (MySQL, CSV files, spreadsheets etc.) to Cloud Firestore.
Another aspect to consider when performing a data migration is the cost. You do get billed for the data downloaded from the Realtime Database, and the data uploaded to Cloud Firestore. In addition to the actual long-term storage cost, you may also have to pay for the network bandwidth usage and the number of documents read/written. This is generally not an issue for small datasets. But if you are a paying Firebase user who has a lot of data, make sure you understand the cost implications of the two database services, before kicking off a bulk data transfer.
I hope this post provides some practical tips on migrating data from Firebase Realtime Database to Cloud Firestore. The official Firestore documentation also discusses this subject at length, and you should definitely go through that as well. If nothing else, I hope this gave you a window into using the Firebase Admin SDK to interact with Realtime Database and Cloud Firestore from server-side code. Happy coding, and feel free to share your experiences with these two cloud database services.