Data Layer best practices

Imagine you have a web app that needs to read and write data from an external database.

For the sake of this example, let’s say you’re writing an Express app on Node.js and choose MongoDB as your database.

You start by defining a class called MongoData where you define some methods that your Express routes can call. Perhaps you implement these methods using Monk, a lightweight promise API on top of the Node Mongo driver.

You start your app and after a bit of debugging you are hitting your endpoints with data that is getting written to Mongo. In this post, I will discuss how this is a poor start to building a well-tested, change-resilient web application.

The first foreseeable problem is what happens if you ever decide in the future to switch databases. While MongoDB works great with Node.js, you may eventually feel that switching to a more structured, relational database like MySQL is a better play.

What will you do in that situation? Probably define a MySQLData class and then update your routes to depend on that. The issue here is that your routes shouldn’t need to change when you change database implementation details. That’s not their responsibility. Their responsibility falls more along the lines of request validation, business logic, and response creation. A good principle in engineering is that changing code in one layer of abstraction should not require changes in a different layer.

Another issue is testing. In this multi-layered application, you probably want several types of tests: integration tests for the data layer classes, unit tests for the validation checks and business logic in your routes, and end-to-end tests that exercises the system as a whole.

The approach of defining specific data layer classes that do not implement a common interface convolutes the integration tests. Let’s say you have a MongoData.test.js file which tests the MongoData class. If you add a MySQLData class your play might be to add a MySQLData.test class and copy a lot of the code from MongoData.test.js. This is fine, but what if you later add PostgresSQLData class. Are you going to copy the code again?

The faulty approach also prevents the developer from unit testing the routes in complete isolation. If a route depends on the MongoData class then the results of your test against the route become dependent on another layer of the system. Ideally, we could mock the data layer while we are testing routes.

At this point, the solution is probably fairly clear: Define an IDataLayer interface and make all data layer classes (MongoData, PostgreSQLData, MySQLData, RedisData, etc.) implement it. Make the routes depend on IDataLayer so that any data layer class can be injected at runtime.

Instead of testing each data layer class you write independently which requires copying a ton of code, write tests against the interface. To do this, create a function which takes an IDataLayer object and then runs tests against it. This way, testing each data layer implementation becomes less than 10 lines. Simply initialize the data layer class which probably requires passing a database connection to the constructor. Then pass the instance into the test function. Super easy!

One issue with integration testing is that oftentimes you want to configure the state of the database before or after running a test. No problem! Create a new interface ITestableDataLayer which extends IDataLayer and adds some convenient methods (e.g. clearAllData(), setData(data)). Make the data layer classes implement ITestableDataLayer which requires implementing these convenience methods in each class. Then update, the interface test function in IDataLayer.test to take an ITestableDataLayer object rather than an IDataLayer object. If you are using a testing framework like Mocha you can then use the convenience methods in the before and after hooks to set state before tests and clean up afterwards.

Do not give the routes an ITestableDataLayer object. They should only get access to the IDataLayer methods.

Finally, let’s consider unit testing the routes. We said we wanted to mock the data layer so create a MockData class that extends ITestableMockData. Your MockData class may store it’s state in memory which can be more reliable than the other data layer classes call external databases. You can even create MockData.test.js and test your MockData class against the interface methods to ensure it works like the real implementations! When you run unit tests against your routes, provide your routes with an instance of MockData and you can rest assured that you are testing your validation/logic in isolation.

Conclusion: For your data layer, write an interface (you should do this for every layer of abstraction). Write tests against the interface that you run against every implementation. Write a mock implementation that can be passed into dependent layers (routes).