9 min readApr 7, 2016

Baseline Testing

Test Driven Development or TDD have been a core part of making sure our software stays more bug free for many decades. But it has its downsides — your tests can be very coupled, or having a high degree of connections, with your source code. This will make applications stiff and not adaptable to changes. A testing strategy, I call Baseline Testing solves the coupling issues with TDD. And I think it could be our future testing strategy for all kinds of software.

Background

Writing automated tests for software has not been easy. I, myself, have been frustrated of not finding a testing strategy which doesn’t take ages to write and is maintainable. Until recently, I did some contribution to the TypeScript compiler and discovered their testing strategy. I immediately found it suitable for my web application development. It is both easy to write, and easy to maintain. In this article, I will cover the core concepts of their testing strategy and how it is applicable to many kinds of software.

Test Driven Development

I would like to cover some basics about automated testing first, to give you some information about what kind of problems software developers faces today. Developers write tests so they can prevent a repetitive task of manual testing. Some software are so large that they require automated tests being written, because manual testing would take too long time to accomplish. And developers often write automated tests to prevent old bugs from reappearing, also called regression testing. A test driven strategy also recommends writing your tests first and then your source code, so you don’t fit your tests after your source code — this is to prevent faulty code from passing your tests.

If you haven’t seen any test code before, here is an example of TDD. We have a backend service that we want to test. And we test the HTTP body of our application to see if it responds correctly to our request. The test is written in JavaScript using the Mocha testing framework, where the it function calls registers a new test:

Static Result Verification

One of the disadvantages of automated testing today, is that you need write the result verification statically, or write the result in code to verify your test. As you can see with our previous example, we have the following code:

In the above example, you can see that I have written statically the expected result for my test. In this case, the code snippet { id: 3, name: ‘John Andersson’ } is the static code verification. Let say, that our application changes a bit. We now, need to add an age property to our users. It would mean that we need to change many tests to allow this new feature:

In the above example I added the property age: 17 as a new static result verification for my tests and I needed to add it to both of my tests, otherwise I cannot be sure it would return an age property on both of my test cases. In our example, we just show two test cases. But in reality it could be hundreds or thousands of test cases that needs to be changed.

Let say, that we don’t just want to verify the HTTP body anymore, but also begin verifying the HTTP headers. Now, we got the tedious task of rewriting nearly all of our tests with the correct HTTP headers.

And again, we are just showing two examples, but in reality it could be hundres or thousands of test cases. This is a recognisable problem pattern for TDD. We want to change a small part of the application, but in doing so we need to change a larger part of our test code.

Having static result verification has a huge disadvantage. If you want to make a change in your code or implement a new feature, that affects your test result — you need to manually change the static result verification. And again, with static result verification — we increased coupling with our source code. It’s like anchoring your test code with your source code. This will have a huge negative impact on your productivity and also make your code base more stiff and less adaptive to changes.

I have seen many developers having the same problem as described above, their test code sort if “sinks” the “floating ship” of the application code. Testing applications should be maintainable and easy.

As one of TypeScript’s maintainers, Ryan Cavanaugh, says:

Writing a test for a bug, should not take more time than fixing the bug itself.

Baseline Testing

We have identified that TDD couples your test code with your source code. You might encounter a tedious task of rewriting a large portion of your tests, whenever you make a small change to your source code. There is a solution to this problem, and I call it Baseline Testing.

In BADD, our tests will record and store all results produced by our source code, and match it with a reference baseline. The reference baseline is previous accepted test results. If we have made some changes to our source code — a new result will be recorded and stored called current baseline. And if we accept this new result — it becomes our new reference baseline, hence the name Baseline Testing. The big benefit is that you don’t need to manually write the expected test result for every test. So when you make a small change, you don’t need to rewrite a large portion of your tests.

Here is an example project. It is very common to have this kind of folder structure in Baseline Testing:

You can see where we store our test cases, current- and reference baselines. In Baseline Testing, you also need to write some kind of testing framework that lets you define your test cases and how the baselines files should be emitted. Under /tests/cases/ we have the following example file:

We defined our test case and how to test it, by issuing a HTTP GET request to /user/1. And we store this file under /tests/cases/getUser.js. Now, we run our tests. Our test framework will now visit GET http://localhost:3000/user/1 and record the HTTP body and HTTP headers and store the content as our current baseline in /tests/baselines/current/getUser.http:

Now, we inspect our current baseline, to see if it is good, and if it is, we accept it and it becomes our new reference baseline. We then, store the content in a our new reference file in tests/baselines/reference/getUser.http.

Let say, that we accidentally added a gender property to our user object:

Our testing framework would automatically try to differentiate the current with the reference baseline, and recognize this line as changed:

If this property was intentionally added we would accept this change and store it as our new reference baseline. But we know it wasn’t intentional — so we would now need to change our source code, to not produce a gender property in our user object.

We just showed the general concept of Baseline Testing where we defined a test case and we let our source code produce our result, by requesting a HTTP request. We stored the result of that request as a new reference baseline. And we accidentally broke our application source code, by introducing a bug. We easily differentiated the reference and current baseline to see what went wrong.

Baseline Testing vs. TDD

Baseline Testing does a bit opposite to what TDD recommends — you write your source code and then the result of your source code becomes your result verification. Baseline Testing fits the test after your source code and that is exactly opposite to what TDD recommends. TDD recommends to fit your source code after your test. Baseline Testing is not a derived strategy from TDD. Baseline Testing is its own strategy. The benefits is that you don’t need any manual written static result verification code in your tests. You will have less coupling between your tests and your source code, which makes it much easier to make any changes to your application. You spend less time on writing tests and more time on your source code.

Besides, TDD is mostly suitable for unit testing. Where it is easy to specify the expected result. Though, sometimes, there are situations where it is not easy to write the expected result. An example is large data structures, which can be tedious to write:

Baseline Testing fits your project if you have tests that is very hard or cumbersome to write the expected result. Keep in mind that Baseline Testing is coupled with a custom testing framework you need to build. Like for instance in our example with the HTTP baseline files. We might need to write a parser and an emitter that could parse and emit those files. And if you want your baseline files to look differently, you would need to build your framework differently. And if you don’t have a lot of test cases — it might not be worth to use Baseline Testing as a testing strategy. If you have many test cases, where the result interface looks rather similar — then Baseline Testing is a probably a good fit for you.

Applications and web servers are very complex software. And they often involve hundreds, if not thousands of test cases. Test cases that you need to write and maintain. These kind of software also have a testing surface that is large(testing surface being the size of the test output). In my opinion, TDD is just not a good fit for these kind of software, because knowing upfront for large test outputs is very hard. Also large test outputs are more susceptible to changes than small ones. And as mentioned before, changes takes time to implement and it is one of the main pain points of TDD.

When you look at some common TDD examples online, they often give you an example of a function that given an input it returns an output. This is the kind of software TDD was meant for — simple functions that returns simple and small data structures. In a real world example would be libraries. Libraries have a lot of functions that could be called and tested. The testing surface is a lot smaller than the surface area of an HTTP endpoint, so recording baselines doesn’t make as much sense anymore, at least, you are not going to get a large benefit over TDD. So TDD is a very good fit for testing libraries, also according to some studies writing tests upfront will yield a less erroneous result.

Baseline Testing is the future

Baseline Testing could be our next generation testing strategy and become a more productive alternative compared to TDD. We spend too much time on writing tests, when in fact, the tests could be compiled directly from our source code. Baseline Testing is a bold, but very effective strategy. It is bold because it questions an historic sound testing strategy — TDD. It is effective, because with just a press of a button, you have created new test verifications. What took hours and days to refactor with TDD — now only take seconds to do with Baseline Testing.