NoSQL Data Modeling
After a pretty successful time building a prototype of Reciprocal.dev we decided to start building a beta that added user accounts and the ability for users to create their own user journey maps with the tool.
I decided to go with Firebase for both the authentication and data storage solution as the interoperability of both solutions made it ideal for performing asynchronous actions on changes to both documents in Firestore and user accounts in Firebase Authentication.
Moving to Firestore meant formalising the structure of the data that we had hacked together for the prototype to make it more scalable and make sure it plays well with the limits Firestore has around document sizes and nested objects.
To do this, I opted to build a number of sub-collections nested at the appropriate level for the data we stored in the prototype and to use an array of objects that had just enough data to be useful for rendering lists, but contained the ID of a document to link to for the full document.
With my data now split into different collections of documents, I needed the means to keep those ‘just enough data’ attributes up-to-date when I performed operations on the full document.
This required creating a means of querying a collection using certain attributes, and I thought these would be useful for others, so here are a few queries I found useful.
Querying a collection for an item
As the data model that I created for reciprocal.dev contains a small subset of data for use in listings, I needed a means of identifying which of these listings I need to update if I remove an item contained in the list.
I found the
collectionGroup functionality useful for performing this type of query as it allows for a query to be conducted across a range of collections all with the same keys, so if you have a nested collection such as:
Then you can pass it the
collection2 key, and it will search across the
collection2 collections under
Finding an item in an array
The majority of my denormalised data uses an array to have the parent record contain a subset of the data stored in documents in a sub-collection.
I have two different types of arrays for this purpose; a list of IDs and a list of objects. The list of IDs is pretty straightforward as you can use the
array-contains operation string in a where query against the collection group.
For an array of objects, you can’t rely on the
array-contains approach directly, so I ended up building an array on my documents, especially for performing this type of query. This meant that I could have the object array for using in my app and the array of IDs for querying for documents I need to amend if I delete an item.
Finding an item used as a map key
My data model doesn’t just utilise data from other documents in arrays; it also uses maps for storing information like configuration options and their values across multiple versions.
Luckily, unlike querying arrays, you can query nested maps (up to a depth of 20, I believe), but you can only perform queries checking the map’s value for a key or if the map has a key in it.
If you find that you need to query objects at a depth of three or more, you might find it easier to break the document up into sub-collections as you’ll be able to query for that query directly, and this gives your data model more flexibility as you’re not dependent on such a strict structure.
When you delete a document that has sub-collections in Firestore, it will not delete the document’s sub-collections (or their sub-collections), which means that if not handled, you could end up with a very messy Firestore instance.
While there are scripts that will recursively delete sub-collections, I decided to utilise the Firestore triggers available in Firebase so when a document is deleted, it would delete its immediate sub-collections and have a trigger for each level of the hierarchy. The Firebase documentation contains a set of queries you can use for deleting sub-collections.
This approach meant that deleting large collections isn’t likely to hit the execution time limit of Firebase Functions (9 minutes), and I also have an easier means of retrying a deletion script.
My justification for this is that ultimately once the initial document the user removed has been deleted, there’s no means to perform the deletion of the child sub-collections synchronously, as the user will never request them.
Using triggers to keep the database in shape
As mentioned above, I use the Firestore triggers in a set of Firebase Functions to keep my documents and collections in shape.
There are a number of trigger points you can use within a Firebase Function, but the ones I settled on were:
auth.user().onCreate()— Triggered when a new user is created in the authentication side of Firebase, useful for creating an initial set of data for the user when they sign up
firestore.document().onCreate()— Triggered when a document is created in Firestore, useful for adding a newly created document to its parent or linked documents
firestore.document().onDelete()— Triggered when a document is deleted in Firestore, useful for removing sub-collections when a document is deleted and removing subsets of data for a document that are in other documents across the database.
There is also a trigger for when a document is updated (onUpdate), but I’ve not needed to use this as I’m not storing data that can be changed in other documents just yet.
The Firebase team have made it really easy to run tests thanks to the suite of emulators you can run locally.
All you need to do is ensure that
FIRESTORE_EMULATOR_HOST is set in your environment variables to point at the emulator before a test run, and the Firebase library will point its queries at the local emulator.
This makes it really easy to create a suite of integration tests to ensure that your queries work as intended. This can also be used to test your security rules.
I’m using Github Actions for my CI, and I found
w9jds/firebase-action really easy to use as it makes sure the emulators are installed on the Github Actions runner for you.
Working with a NoSQL database requires a little more work than a relational database due to the denormalised approach needed to keep reads quick, but with the right queries and tools, this maintenance doesn’t need to be a chore.
By understanding how Firestore and Firebase work, you can write simple functions to keep everything up-to-date in a scalable manner. Hopefully, the queries I’ve written about will help you achieve this.
Firebase is proving to be a really great set of services for me, and I’m hoping that it will continue to do so in the future. As it’s a Google product, I do worry that it’ll go the way of so many of its other services, but as it’s a developer offering instead of a consumer one, I’m hopeful it will stick around for some time.