Deterministic Database-Seeding using Node.js and Cloud Firestore

Generate deterministic and understandable data for better manual-testing and development phases

Did you notice that «testing» was mentioned before «development»?! 🤓

TL;DR

The main goal of this post is to motivate developers to create and maintain programs to seed their databases accordingly to both the common use cases of the user journey and the edge cases.

Deterministic Seeding allows you to pre-fill a database with fake but understandable data. No more { "firstName": "qqqq", "lastName": "wwww", "email": "qqqq.wwww@eeee.rt" }. You will have a reproducible account list with whatever you want.

You will (re-)discover that proper database seeding is also a powerful tool for end-to-end automatic testing because it allows you to run almost the same test cases that you could run manually.

Almost all popular backend frameworks have a seeding system.
In this tutorial, you will learn how to build one on your own. But if you don’t want to, you can skip the abstract seeder part and focus on the specialized seeders.

Don’t lose time anymore

I’ve been in this situation in a lot of projects:

I’ve followed a succinct “README” file to install and launch the project. I open the homepage, create a new account and try to reproduce the issue written in the bug ticket. But, I can’t reproduce it because it depends on data which are complicated to obtain manually.

So I ask a colleague: “Can you send me a dump of your database, please? I can’t reproduce the context and you have a lot more data than me…”. 🤦‍♂️

What is a Deterministic Seeding?

Seeding is the action of pre-filling a database with data that match the defined type. It’s commonly used to store all the attending enumeration values for both backend and frontend. 🦩

Here we will talk about a programmatic and deterministic approach which allows us to reproduce the same dataset anywhere without hard-coding it. For this, we will use a library that generates human-readable data using a seed key.
So, if multiple processes (local or not) use the same key, they will obtain the same dataset. This key is the only value that should be manually defined.

Make your API frontend-driven developed

Did somebody say ?! 🤩

By doing such thing as Deterministic Seeding you will make your API more frontend-oriented because your data will reflect real use cases and allows you to facilitate your tests according to the User Stories you have.

If you use GraphQL, for instance, you will create as many data as necessary to cover the schema specifications.

Schema example

This is the GraphQL schema we will use during the example implementation.

src/schema.graphql

As you can see, you have required, optional and enum fields.

In that case, you have to cover the most cases you can, keeping in mind the realistic aspect of your data.
It will probably be useless to add a User without description and another without a job. A single User without both of these fields could be enough. It depends on your frontend specifications.

Generated Post example

According to the previous schema, a generated Post could be:

The only non-predictable values are the id and the author because they are both Firestore IDs.

And of course, you can use your seed program to seed all your enum tables/collections in production and development environments. Here: postTypes. It is not necessary in our example because we use a NoSQL database.

A bit more about Testing

As well as you should create unit tests to cover newly founded bugs, you also should create new deterministic data that allow you to manually and automatically reproduce the issue.

Get into this habit and your application will be less error-prone.

Implementation

We will implement our seeding program for a Cloud Firestore database using and in Node.js.

In this implementation, we won’t directly use the Firestore library. We will use to avoid dealing with .

You can retrieve the entire example at the following address: 🔗

The Seeders

src/seeders/index.js

This is the ordered list of seeders. The order ensures that weak entities will be seeded before the strong.

The AbstractSeeder

It is the abstract class used to create the Chance.js instance and define the structure of each Seeder.

src/seeders/Seeder.ts

The PostSeeder

The PostSeeder will create 1 Post of each PostType for each User.
There are 3 types. So, in the end, there will be 60 Posts because the UserSeeder has added 20 Users.

src/seeders/PostSeeder.ts

The Script

In its seed function, the script will sequentially call the clean method of each seeder before their run method. This will ensure that weak entities will be created before strong ones.

src/scripts/seed.ts

A condition prevents this script to be run in the production environment.

Run

According to the package.json’s scripts, you can now run the following command to start the seeding procedure:

yarn seed

That’s it!

You can re-run this command and obtain the same result.

Let’s Seed! 👨‍💻

And test, of course!

Written by

Passionate Agile Developer • Scrum Master •

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store