Deterministic Database-Seeding using Node.js and Cloud Firestore

Generate deterministic and understandable data for better manual-testing and development phases

Jean Desravines
5 min readAug 27, 2020
Photo by Hello I’m Nik on Unsplash

Did you notice that «testing» was mentioned before «development»?! 🤓

TL;DR

The main goal of this post is to motivate developers to create and maintain programs to seed their databases according to both the common use cases of the user journey and the edge cases.

Deterministic Seeding allows you to pre-fill a database with fake but understandable data. No more { "firstName": "qqqq", "lastName": "wwww", "email": "qqqq.wwww@eeee.rt" }. You will have a reproducible account list with whatever you want.

You will (re-)discover that proper database seeding is also a powerful tool for end-to-end automatic testing because it allows you to run almost the same test cases that you could run manually.

Almost all popular backend frameworks have a seeding system.
In this tutorial, you will learn how to build one on your own. But if you don’t want to, you can skip the abstract seeder part and focus on the specialized seeders.

Don’t lose time anymore

I’ve been in this situation in a lot of projects:

I’ve followed a succinct “README” file to install and launch the project. I open the homepage, create a new account and try to reproduce the issue written in the bug ticket. But, I can’t reproduce it because it depends on data that are complicated to obtain manually.

So I asked a colleague: “Can you send me a dump of your database, please? I can’t reproduce the context and you have a lot more data than me…”. 🤦‍♂️

What is a Deterministic Seeding?

Seeding is the action of pre-filling a database with data that match the defined type. It’s commonly used to store all the attending enumeration values for both backend and frontend. 🦩

Here we will talk about a programmatic and deterministic approach that allows us to reproduce the same dataset anywhere without hard-coding it. For this, we will use a library that generates human-readable data using a seed key.
So, if multiple processes (local or not) use the same key, they will obtain the same dataset. This key is the only value that should be manually defined.

Make your API frontend-driven developed

Did somebody say GraphQL?! 🤩

By doing such a thing as Deterministic Seeding you will make your API more frontend-oriented because your data will reflect real use cases and allows you to facilitate your tests according to the User Stories you have.

If you use GraphQL, for instance, you will create as much data as necessary to cover the schema specifications.

Schema example

This is the GraphQL schema we will use during the example implementation.

enum PostType {
ARTICLE
STORY
TUTORIAL
}

type User {
id: ID!
username: String!
email: String!
name: String!
description: String
job: String
}

type Post {
id: ID!
author: User!
title: String!
body: String!
type: PostType!
publishedAt: DateTime
}

As you can see, you have required, optional and enum fields.

In that case, you have to cover the most cases you can, keeping in mind the realistic aspect of your data.
It will probably be useless to add a User without a description and another without a job. A single User without both of these fields could be enough. It depends on your front-end specifications.

Generated Post example

According to the previous schema, a generated Post could be:

{
"id": "VefJHSsz4JOK2kwqBsX3",
"author": "1iTR5Nxi8oRghkx865WZ",
"title": "Wezaw pi lab lefugor gafdupwu opa joworo kotcorin nipranas hod tehnes ako fi veh ulogihak.",
"body": "Nav movteloz rusik min itijunij ti wedwiw teum ec hamin lonfo ukep so. Ug huna ner misowef lawep zuali rembecam cap na kasarmu vabkoz nemperfab fo woufwo hu hoezuva. Evu bawvur opmop loun uv isose ipzevit iteetde behlok lemse neci vozim ezipub demihil zizor.",
"type": "ARTICLE",
"publishedAt": 1598446958199
}

The only non-predictable values are the id and the author because they are both Firestore IDs.

And of course, you can use your seed program to seed all your enum tables/collections in production and development environments. Here: postTypes. It is not necessary in our example because we use a NoSQL database.

A bit more about Testing

As well as you should create unit tests to cover newly founded bugs, you also should create new deterministic data that allow you to manually and automatically reproduce the issue.

Get into this habit and your application will be less error-prone.

Implementation

We will implement our seeding program for a Cloud Firestore database using Chance and Firebase Admin in Node.js.

In this implementation, we won’t directly use the Firestore library. We will use a Model’s facade to avoid dealing with the Firestore verbosity.

The Seeders

import PostSeeder from './PostSeeder'
import Seeder from './Seeder'
import UserSeeder from './UserSeeder'

export default [new UserSeeder(), new PostSeeder()] as Seeder[]

This is the ordered list of seeders. The order ensures that weak entities will be seeded before the strong.

The AbstractSeeder

It is the abstract class used to create the Chance.js instance and define the structure of each Seeder.

import Chance from 'chance'
import * as configuration from '../configuration'

export default abstract class Seeder {
/**
* The Chance singleton
*/
static readonly generator: Chance.Chance = new Chance(
configuration.chance.key
)

/**
* Clean the Collection
*/
abstract async clean(): Promise<void>

/**
* Create Entities
*/
abstract async run(): Promise<void>
}t

The PostSeeder

The PostSeeder will create 1 Post of each PostType for each User.
There are 3 types. So, in the end, there will be 60 Posts because the UserSeeder has added 20 Users.

import Model from '../entities/Model/model'
import { PostType } from '../entities/Post/constants'
import Post from '../entities/Post/model'
import User from '../entities/User/model'
import Seeder from './Seeder'

export default class PostSeeder extends Seeder {
/**
* Clean the Collection
*/
async clean(): Promise<void> {
return Post.deleteManyBy()
}

/**
* Create 4 Posts per Users. 1 of each type.
*/
async run(): Promise<void> {
const users: User[] = await User.findManyBy()
const types: String[] = Object.values(PostType)

const documents = users.flatMap((user: User) => {
return types.map((type, i) => {
// Only 1 in 2 article will have a publication date
const publishedDate = i % 2 ? Seeder.generator.date({ year: 2000 }) : null
const publishedAt = (publishedDate as Date)?.getTime()

return new Post({
publishedAt,
type: type as PostType,
author: user.id as string,
title: Seeder.generator.sentence(),
body: Seeder.generator.paragraph(),
})
})
})

await Post.saveMany(documents as Model[])
}
}

The Script

In its seed function, the script will sequentially call the clean method of each seeder before their run method. This will ensure that weak entities will be created before strong ones.

import _ from 'lodash'
import * as configuration from '../configuration'
import seeders from '../seeders'
import Seeder from '../seeders/Seeder'

if (configuration.app.isProduction) {
throw new Error('You should not seed the database in other environment than the development one!')
}

/**
* Seed the Database by walking through each seeders and execute their `clean` and `run` methods
* if their `shouldRun` method returns true
*/
async function seed(): Promise<void> {
const reducer = async (deferred: Promise<void>, seeder: Seeder) => {
await deferred

console.info(`Start seeding for ${seeder.constructor.name} …`)
console.info(`Should run: ${shouldRun}`)

await seeder.clean()
await seeder.run()

console.info(`… done`)
}

await _.reduce(seeders, reducer, Promise.resolve())
}

/////
/////

seed()

A condition prevents this script to be run in the production environment.

Run

According to the package.json’s scripts, you can now run the following command to start the seeding procedure:

yarn seed

That’s it!

You can re-run this command and obtain the same result.

Let’s Seed! 👨‍💻

And test, of course!

--

--