Writing Automated Tests on a Legacy Node.js Back-End

Adrien Joly
shodo.io
Published in
17 min readJul 22, 2022

This article was originally published on InfoQ at this date

Key Takeaways

  • Making I/O calls from business logic makes it difficult to write tests and slows down test execution
  • The disadvantage of relying on mocks to test coupled code is tests may fail even when the business logic did not change
  • Reducing coupling with dependency inversion makes code easier to understand and maintain and enables more robust and efficient tests
  • To prevent regressions while refactoring, write approval tests for your APIs
  • After refactoring, we can write (fast) unit tests to specify business logic/requirements and make sure that our implementation complies with these specs

Since its initial release in 2009, Node.js has gained widespread adoption as a back-end framework by a growing range of companies. A few reasons for its success: the use of the JavaScript language (a.k.a. The Language of the Web), a rich ecosystem of open source modules and tools, and its simplicity to efficiently prototype APIs.

Unfortunately, that simplicity is a double-edged sword. As a simple Node.js API grows and gets more complex, developers who lack experience with software design and best practices may quickly feel overwhelmed with software entropy, accidental complexity or technical debt.

Moreover, the flexibility of the JavaScript language can easily be abused, quickly turning working prototypes into unmaintainable monsters running in production. Best practices traditionally used with OOP languages like Java and C# (e.g. SOLID principles) can easily be overlooked when starting up a project in Node.js, for better or worse.

I’ve felt the pain of software entropy when helping my clients (mostly start-up companies) improve their Node.js codebase, and also on open-source projects I authored. For instance, I’ve faced a growing number of challenges while maintaining a Node.js application I started to write ten years ago: openwhyd.org. I often discover similar challenges in my clients’ Node.js codebases: adding features breaks seemingly unrelated features, bugs become difficult to detect and fix, automated tests are challenging to write, slow to run and fail for random reasons…

Let’s explore why some Node.js codebases are harder to test than others. And explore several techniques to write tests that are simple, robust and fast to check the business logic. Including Dependency Inversion (i.e. the “D” of SOLID), approval tests and — spoiler alert — no mocks!

Testing Business Logic

As a practical example, let’s introduce a feature of Openwhyd that is not covered by automated tests yet: “Hot Tracks”.

This feature is a ranking board of music tracks that were posted, liked and played most often by Openwhyd users, over the last 7 days.

It consists of three use cases:

  • display a ranked list of tracks;
  • update the ranking whenever a track is posted, reposted, liked and/or played;
  • display the popularity trend (i.e. increasing, decreasing or stable) of each track by tracking the evolution of their ranking.

In order to prevent regressions on the happy path of these three use cases, let’s formulate the following test cases as Behavior-Driven Development (BDD) scenarios:

Given a list of tracks posted by a different number of users
When a visitor consults the "hot tracks" page
Then tracks are displayed in descending order of popularityGiven two tracks with the same score
When a user reposts one of these tracks
Then that track gets to the top of the "hot tracks" rankingGiven two tracks posted 2 weeks ago, with a slightly different score
And the track that has the lowest score is reposted 1 week later
When the scores are snapshotted
Then the ranking of the reposted track is shown as "increasing" in the "hot tracks" page

Let’s imagine how we would like to turn the first scenario into an ideal automated test:

describe("Hot Tracks", () => {
it("displays the tracks in descending order of popularity", async () => {
const regularTrack = { popularityScore: 1 };
const popularTrack = { popularityScore: 2 };
storeTracks([ regularTrack, popularTrack ]);
expect(await getHotTracks()).toMatchObject([ popularTrack, regularTrack ]);
});
});

At this stage, this test won’t pass because the getHotTracks() requires a database connection that our test does not provide, and storeTracks() is not implemented.

Making that test pass is going to be our goal, from now on. To better understand what makes Hot Tracks hard to test that way, let’s study the current implementation.

Why This Test Can’t Pass (for now)

Currently, Openwhyd’s Hot Tracks consists of several functions exported from the models/tracks.js file:

  • getHotTracks() is called by the Hot Tracks API controller to fetch the list of ranked tracks before rendering it;
  • updateByEid() is called whenever a track is updated or deleted by a user, in order to update its popularity score;
  • snapshotTrackScores() is called every Sunday, in order to compute the trend of each track displayed during the following week.

Let’s see what the getHotTracks() function does:

const mongodb = require('./mongodb.js');/* fetch top hot tracks, without processing */
const fetchRankedTracks = (params) =>
mongodb.tracks
.find({}, { sort: [['score', 'desc']], ...params })
.toArray();

exports.getHotTracks = async function (params = {}) {
const tracks = await fetchRankedTracks(params);
const pidList = snip.objArrayToValueArray(tracks, 'pId');
const posts = await fetchPostsByPid(pidList);
return tracks.map((track, i) => {
const post = posts.find(({ eId }) => eId === track.eId);
return computeTrend(post ? mergePostData(track, post) : track);
});
};

It’s complicated to write unit tests for this function because its business logic (e.g. the computation of each track’s trend) is intertwined with data queries that are sent to a global MongoDB connection (mongodb.js).

This means that, with the current implementation, the only ways to test Openwhyd’s Hot Tracks logic as-is are:

  • by sending API requests to a running Openwhyd server connected to a MongoDB server, hence testing this system as a black box;
  • by calling these functions directly, after initializing the MongoDB database that they depend on.

Both solutions require starting up and populating a MongoDB database server. This would make our tests complicated to implement and slow to run.

Takeaway: Coupling business logic with I/O (e.g. database queries) makes it hard to write tests, slows their execution, and make these tests fragile.

The Problem with Mocks

One way to avoid relying on a MongoDB database to run our tests is to fake that database, using what Jest refers to as “mocks”. (a.k.a. “stubs”, as defined by Martin Fowler in Mocks Aren’t Stubs)

Injecting a mock consists of asking the test runner to hot-swap a dependency used by the system under test (e.g. the database client used by our server) with a fake version, so that automated tests can override the behavior of that dependency.

In our case, the fetchRankedTracks() function calls mongodb.tracks.find(), imported from the mongodb module. So, instead of letting the data query hit an actual MongoDB database, our automated test could setup a fake in-memory database and redirect data queries to it:

jest.mock("mongodb.js", {
tracks: {
find: (queryObj, { sort }) => ({
toArray: () => ([
{ name: 'track1', score: 1 },
{ name: 'track2', score: 2 },
]),
}),
},
});

This works.

But what if the feature under test calls the same function several times, with different queries?

async function compareGenres(genre1, genre2) {
const tracks1 = await mongodb.tracks.find({ genre: genre1 }).toArray();
const tracks1 = await mongodb.tracks.find({ genre: genre2 }).toArray();
// [...]
}

In that case, mocks and tests that initialize them quickly grow bigger, get more complex, and therefore harder to maintain

jest.mock("mongodb", {
tracks: {
find: (queryObj, params) => ({
toArray: () => {
if (queryObj === 'rock') {
return [
{ name: 'track1', score: 1 },
{ name: 'track2', score: 2 },
];
} else if (queryObj === 'hip hop') {
return [
{ name: 'track3', score: 1 },
{ name: 'track4', score: 2 },
];
}
},
}),
},
});

More importantly, doing so implies that automated tests depend on implementation details that are independent from our business logic.

Two reasons:

  • mocks would be tied to the implementation of our data model, i.e. we will have to rewrite them whenever we decide to refactor it; (e.g. rename a property)
  • mocks would be tied to the interface of the replaced dependency, i.e. we will have to rewrite them whenever we upgrade mongodb to a new version with breaking API changes, or if we decide to migrate database queries over to a different ORM.

This means that we will sometimes have to update our automated tests even when the business logic did not change!

In our case, if we decide to mock the mongodb dependency in our tests, writing and updating tests will require more work. To avoid that, developers may be discouraged to upgrade dependencies, to change the data model, or worse: to write tests in the first place!

Surely, we’d rather save some time to do more important things, like implementing new features!

Takeaway: When relying on mocks to test tighly coupled code, automated tests may fail even though the business logic did not change. In the long run, mocking database queries makes test less stable and less readable.

Dependency Inversion

Based on the previous examples, mocking database queries is unlikely to be a viable, long-term way to test business logic.

As an alternative, could we abstract the dependency between our business logic and our data source: mongodb?

Yes. We can decouple the feature and its underlying data fetching logic, by letting the caller of the feature inject a way for the business logic to fetch the data it needs.

In practice, instead of importing mongodb from our model, we pass that model as a parameter so that callers can specify any implementation of that data source, at runtime.

Here’s how doing this would transform the getHotTracks() function, with types expressed in TypeScript:

exports.getHotTracks = async function (
fetchRankedTracks: () => Track[],
fetchCorrespondingPosts: (tracks: Track[]) => Post[]
) {
const tracks = await fetchRankedTracks();
const posts = await fetchCorrespondingPosts(tracks);
// [...]

That way:

  • different implementations of fetchRankedTracks() and fetchCorrespondingPosts() can be injected when calling getHotTracks(), based on the execution environment of our application: a mongodb-based implementation would be used in production, and custom in-memory implementations would be instantiated for each automated test;
  • we can test the logic of our model without having to start a database server nor ask our test running to inject a mock;
  • automated tests won’t need to be updated when the API of the database client changes.

Takeaway: Dependency Inversion helps loosen the coupling between business logic and data persistence layers. We can refactor tightly coupled code in order to make it easier to understand and maintain, and to write robust and fast unit tests for it.

Better Safe Than Sorry

In the previous section, we saw how Dependency Inversion helps loosen the coupling between business logic and data persistence layers.

In order to prevent bugs from sneaking in while we refactor our current implementation, we should make sure that the refactoring does not have any effect on the feature’s behavior.

To detect changes of behavior on tightly coupled code that is not sufficiently covered by automated tests, we can write approval tests. Approval tests collect traces of execution, and check that these traces remain identical when executing them again after implementation changes. They are provisional, until it becomes possible to write better tests (e.g. unit tests) for our business logic.

In our case:

  • in terms of inputs (or triggers): the “Hot Tracks” feature is triggered by Openwhyd’s API, when HTTP requests are received on the /hot and /api/post endpoints;
  • in terms of outputs (or traces): these HTTP endpoints provide a response, and may insert and/or update objects in the tracks database collection.

So we should be able to detect functional regressions by making API requests and watching for changes in the resulting responses and/or the state of the tracks database collection.

// Note: before running these tests, make sure that MongoDB and Openwhyd server are running.describe("Hot Tracks (approval tests - to be replaced later by unit tests)", () => {
beforeEach(async () => {
await mongodb.clearDatabase();
}); it("renders ranked tracks", async () => {
await mongodb.tracks.insertMany([
{ name: "a regular track", score: 1 },
{ name: "a popular track", score: 2 },
]);
const serverURL = await startOpenwhydServer();
const html = await httpClient.get(`${serverURL}/hot`);
expect(html).toMatchSnapshot();
// Note: the request above does make in change in "tracks" collection => no need to snapshot the state of that collection.
}); it("updates the score of a track when it's reposted", async () => {
const users = [
{ id: 0, name: "user 0", pwd: "123" },
{ id: 1, name: "user 1", pwd: "456" },
];
await mongodb.users.insertMany(users);
const serverURL = await startOpenwhydServer();
const userSession = [
await httpClient.post(`${serverURL}/api/login`, users[0]),
await httpClient.post(`${serverURL}/api/login`, users[1]),
];
const posts = [
// user 0 posts track A
await httpClient.post(
`${serverURL}/api/post`,
{ action: "insert", eId: "track_A" },
{ cookies: userSession[0].cookies }
),
// user 0 posts track B
await httpClient.post(
`${serverURL}/api/post`,
{ action: "insert", eId: "track_B" },
{ cookies: userSession[0].cookies }
),
];
// user 1 reposts track A
await httpClient.post(
`${serverURL}/api/post`,
{ action: "insert", pId: posts[0].pId },
{ cookies: userSession[1].cookies }
);
const ranking = await httpClient.get(`${serverURL}/hot?format=json`);
expect(ranking).toMatchSnapshot();
// Note: the requests above updates the "tracks" collection => we also snapshot the state of that collection.
const tracksCollection = await mongodb.tracks.find({}).toArray();
expect(tracksCollection).toMatchSnapshot();
});
});

Note that these tests can be run against Openwhyd’s API as-is, because they only manipulate external interfaces. So, these approval tests can also be qualified as grey-box tests or end-to-end API tests.

The first time we run these tests, our test runner will generate snapshot files containing the data passed to toMatchSnapshot(), for each test assertion. Before committing these files into our versioning system (e.g. git), we must check that the data is correct and sufficient to be used as a reference. Hence the name: "approval tests".

Note: It’s important to read the implementation of the tested functions, to discover parameters and characteristics that must be covered by these tests. For example, the getHotTracks() function takes a limit and a skip parameter that are used for pagination, and it merges additional data after fetching from the post collection. Make sure to increase the coverage of approval tests accordingly, to detect regressions on all critical parts of the logic.

Problem: Same Logic, Different Traces

After committing the snapshots and re-running the approval tests, you may realize that they already fail!

Jest shows us that object identifiers and dates are different every time we run them…

To solve this problem, we replace dynamic values by placeholders before passing results to Jest’s toMatchSnapshot() function:

const { _id } = await httpClient.post(
`${serverURL}/api/post`,
{ action: "insert", pId: posts[0].pId },
{ cookies: userSession[1].cookies }
);
const cleanJSON = (body) => body.replaceAll(_id, '__posted_track_id__');
const ranking = await httpClient.get(`${serverURL}/hot?format=json`);
expect(cleanJSON(ranking)).toMatchSnapshot();

Now that we have kept a reference of what outputs are expected for these use cases, we can safely refactor our code and run these tests again to make sure that the outputs remain identical.

Refactoring for Unit Tests

Now that we have approval tests to warn us if the behavior of our “hot tracks” feature changes, we can safely refactor the implementation of that feature.

To reduce the cognitive load of the refactoring process we’re about to begin, let’s start by:

  1. removing any dead and/or commented code;
  2. using await on asynchronous function calls, instead of passing callbacks or calling .then() on promises; (this will greatly simplify the process of writing tests and moving code chunks around)
  3. appending the FromDb suffix to the name of legacy functions that rely on the database, to clearly differentiate them from the new functions we're about to introduce. (i.e. rename the getHotTracks() function to getHotTracksFromDb(), and fetchRankedTracks() to fetchRankedTracksFromDb())

It’s tempting to start renaming and moving chunks of code by following our instinct. The risk of doing so is to end up with code that is difficult to test…

Let’s proceed the other way around: write a test that clearly checks one aspect of our feature’s behavior, and then refactor our code so that test can pass. The Test-Driven Development process (TDD) will help us come up with a new design that makes that feature easy to test.

The tests we are going to write are unit tests. So they will be extremely fast to run, and will not require starting a database, nor Openwhyd’s API server. To achieve that, we’re going to extract the business logic so it can be tested separately from the underlying infrastructure.

Also, we are not going to use snapshots this time. Instead, let’s formulate human-readable expectations on how the feature should behave, similar to the earlier application of BDD.

Let’s start with a very easy one: if there is just one track posted on Openwhyd, it should be listed on the first position of hot tracks.

describe('hot tracks feature', () => {
it('should list one track in first position, if just that track was posted', () => {
const postedTrack = { name: 'a good track' };
expect(getHotTracks()).toMatchObject([ postedTrack ]);
});
});

So far, this test is valid but it does not pass because getHotTracks() is not defined. Let's provide the simplest implementation, just to make the test pass.

function getHotTracks() {
return [{ name: 'a good track' }];
}

Now that the test passes, the third step of the TDD methodology states that we should clean-up and/or refactor our code, but so far there’s not much to do! So let’s start a second TDD iteration by writing a second test.

This test does not pass, because getHotTracks() returns a value that was hard-coded for the first test to pass. In order to make that function work for both test cases, let's provide input data as a parameter.

function getHotTracks(postedTracks) {
// sorts tracks by descending score
return postedTracks.sort((a, b) => b.score - a.score);
}describe('hot tracks feature', () => {
it('should list one track in first position, if just that track was posted', () => {
const postedTrack = { name: 'a good track' };
const postedTracks = [ postedTrack ];
expect(getHotTracks(postedTracks)).toMatchObject([ postedTrack ]);
}); it('should list the track with higher score in first position, given two posted tracks with different scores', () => {
const regularTrack = { name: 'a good track', score: 1 };
const popularTrack = { name: 'a very good track', score: 2 };
const postedTracks = [ regularTrack, popularTrack ];
expect(getHotTracks(postedTracks)).toMatchObject([ popularTrack, regularTrack ]);
});
});

Now that our two unit tests pass against a very basic implementation, let’s try to get getHotTracks() closer to its real implementation (refered to as getHotTracksFromDb()), the one currently used in production.

In order to keep these tests pure (i.e. tests that don’t produce any side effects, and therefore don’t run any I/O operation), the getHotTracks() function that they call must not depend on a database client.

To achieve that, let’s apply Dependency inversion: replace the postedTracks parameter (type: array of tracks) of getHotTracks() by a getTracksByDescendingScore() function that will give access to these tracks. This will allow getHotTracks() to call that function whenever it needs the data. Therefore, we are giving more control to getHotTracks(), while transfering to the caller the responsibility of how the data will be actually fetched.

function getHotTracks(getTracksByDescendingScore) {
return getTracksByDescendingScore();
}describe('hot tracks feature', () => {
it('should list one track in first position, if just that track was posted', () => {
const postedTrack = { name: 'a good track' };
const getTracksByDescendingScore = () => [ postedTrack ];
expect(getHotTracks(getTracksByDescendingScore)).toMatchObject([ postedTrack ]);
}); it('should list the track with higher score in first position, given two posted tracks with different scores', () => {
const regularTrack = { name: 'a good track', score: 1 };
const popularTrack = { name: 'a very good track', score: 2 };
const getTracksByDescendingScore = () => [ regularTrack, popularTrack ];
expect(getHotTracks(getTracksByDescendingScore)).toMatchObject([ popularTrack, regularTrack ]);
});
});

Now that we got our pure implementation of getHotTracks() a bit closer to the real one, let's call it from the real one!

/* fetch top hot tracks, and include complete post data (from the "post" collection), score, and rank increment */
exports.getHotTracksFromDb = async function (params = {}, handler) {
const getTracksByDescendingScore = () => exports.fetchRankedTracks(params);
const tracks = await getHotTracks(getTracksByDescendingScore);
const pidList = snip.objArrayToValueArray(tracks, 'pId');
const posts = await fetchPostsByPid(pidList);
// complete track items with additional metadata (from posts)
return tracks.map((track) => {
const post = posts.find(({ eId }) => eId === track.eId);
return computeTrend(post ? mergePostData(track, post) : track);
});
}

Our unit tests and approval tests still work, proving that we haven’t broken anything!

Now that “hot tracks” model calls our pure “hot tracks” feature logic, we can progressively move logic from the first to the second, while writing unit tests.

Our next step will be to move the logic that completes track items data with additional metadata from posts, from getHotTracksFromDb() to getHotTracks().

We observe from the production logic that:

  • similarly to tracks, posts are fetched from the database by calling the fetchPostsByPid() function, so we will have to apply dependency inversion again on that function;
  • data between the track and post collections is joined by two fields: eId and pId.

Before moving that logic, and based on these observations, let’s define the intended behavior of getHotTracks() as a new unit test.

it('should return tracks with post metadata', async () => {
const posts = [
{
_id: '61e19a3f078b4c9934e72ce4',
eId: '1',
pl: { name: 'soundtrack of my life', id: 0 }, // metadata from the post that will be included in the list of hot tracks
},
{
_id: '61e19a3f078b4c9934e72ce5',
eId: '2',
text: 'my favorite track ever!', // metadata from the post that will be included in the list of hot tracks
},
];
const getTracksByDescendingScore = () => [
{ eId: posts[0].eId, pId: posts[0]._id },
{ eId: posts[1].eId, pId: posts[1]._id },
];
const fetchPostsByPid = (pidList) => posts.filter(({ _id }) => pidList.includes(_id));
const hotTracks = await getHotTracks(getTracksByDescendingScore, fetchPostsByPid);
expect(hotTracks[0].pl).toMatchObject(posts[0].pl);
expect(hotTracks[1].text).toMatchObject(posts[1].text);
});

To make that test pass, we move the call to fetchPostsByPid() and it's sebsequent logic, from getHotTracksFromDb() to getHotTracks().

// file: app/features/hot-tracks.js
exports.getHotTracks = async function (getTracksByDescendingScore, fetchPostsByPid) {
const tracks = await getTracksByDescendingScore();
const pidList = snip.objArrayToValueArray(tracks, 'pId');
const posts = await fetchPostsByPid(pidList);
// complete track items with additional metadata (from posts)
return tracks.map((track) => {
const post = posts.find(({ eId }) => eId === track.eId);
return computeTrend(post ? mergePostData(track, post) : track);
});
};// file: app/models/track.js
exports.getHotTracksFromDb = async function (params = {}, handler) {
const getTracksByDescendingScore = () =>
new Promise((resolve) => {
exports.fetch(params, resolve);
});
return feature.getHotTracks(getTracksByDescendingScore, fetchPostsByPid);
};

At that point, we moved all the data manipulation logic to getHotTracks(), and getHotTracksFromDb() just contains the necessary plumbing to feed it with actual data from the database.

To make our tests pass, there is just one last thing we need to do: pass the fetchPostsByPid() function as a parameter to getHotTracks(). For our two initial tests, fetchPostsByPid() can return an empty array.

it('should list one track in first position, if just that track was posted', async () => {
const postedTrack = { name: 'a good track' };
const getTracksByDescendingScore = () => [ postedTrack ];
const fetchPostsByPid = () => [];
expect(await getHotTracks(getTracksByDescendingScore, fetchPostsByPid))
.toMatchObject([postedTrack]);
});it('should list the track with higher score in first position, given two posted tracks with different scores', async () => {
const regularTrack = { name: 'a good track', score: 1 };
const popularTrack = { name: 'a very good track', score: 2 };
const getTracksByDescendingScore = () => [ popularTrack, regularTrack ];
const fetchPostsByPid = () => [];
expect(await getHotTracks(getTracksByDescendingScore, fetchPostsByPid))
.toMatchObject([ popularTrack, regularTrack ]);
});it('should return tracks with post metadata', async () => {
const posts = [
{
_id: '61e19a3f078b4c9934e72ce4',
eId: '1',
pl: { name: 'soundtrack of my life', id: 0 }, // metadata from the post that will be included in the list of hot tracks
},
{
_id: '61e19a3f078b4c9934e72ce5',
eId: '2',
text: 'my favorite track ever!', // metadata from the post that will be included in the list of hot tracks
},
];
const getTracksByDescendingScore = () => [
{ eId: posts[0].eId, pId: posts[0]._id },
{ eId: posts[1].eId, pId: posts[1]._id },
];
const fetchPostsByPid = (pidList) => posts.filter(({ _id }) => pidList.includes(_id));
const hotTracks = await getHotTracks(getTracksByDescendingScore, fetchPostsByPid);
expect(hotTracks[0].pl).toMatchObject(posts[0].pl);
expect(hotTracks[1].text).toMatchObject(posts[1].text);
});

Now that we have successfully extracted the business logic from getHotTracksFromDb() to getHotTracks(), and covered that pure logic with unit tests, we can safely delete the approval tests that we had written earlier to prevent regressions on that function: it renders ranked tracks.

We can follow the exact same process on the two remaining use cases:

  1. write unit tests based on their BDD scenarios,
  2. refactor the underlying functions so those tests pass,
  3. delete the corresponding approval tests.

Conclusion

We have improved the testability of our codebase and our approach to testing:

  • exploring an example of production code that is complicated to test because business logic is tightly coupled with database queries;
  • discussing the downsides of relying on a database (real or mocked) to write automated tests for that logic;
  • authoring approval tests as a way to detect any functional regression that could happen while we are refactoring this logic;
  • refactoring the logic progressively, in TDD, using the dependency inversion principle (a.k.a. the “D” of “SOLID”);
  • and eliminating approval tests, in favor of the pure and human-readable unit tests we wrote during that process.

Adopting these patterns and principles (e.g. SOLID) that are widely accepted and applied in Object-Oriented Programming languages (OOP) help us author better tests and make our codebases sustainably easier to maintain while preserving the ergonomics of JavaScript and TypeScript environments.

I would like to thank my colleague Julien Topçu (Tech Coach at SHODO) for reviewing and refining the concepts and examples to improve your approach to testing.

--

--

Adrien Joly
shodo.io

👨‍💻 Software crafter @SHODO, legacy code / tech debt doctor (http://ajo.ovh/pro) 🥁 Drummer of “Harissa”, VR lover, music digger