It has frequently been asserted, with good reason, that it is a bad idea to rewrite code from scratch (e.g. the “single worst strategic mistake” ). There are a number of reasons for this. One important reason is that existing code may contain a lot of handling of subtle scenarios and throwing away all of that knowledge and making the same mistakes all over again can be unwise.
In keeping with this, the teams I have been working on for the last couple of years have been reshaping a codebase of some considerable size (for our initial approach see here). For the most part we have been working on the principle of refactoring from the bottom upwards. That is to say, starting at the lowest levels of the code base, extracting new classes or coercing existing classes to enforce single responsibility, loose coupling, testability, and low complexity. However, there was one area of the codebase where it became quickly apparent that this approach would not prove fruitful.
In this case we determined that the existing approach was flawed and that we did need to rewrite it. So then the question became how to incrementally deliver a rewrite which would preserve all of the previous behaviour.
Why we needed to rewrite
When I say the existing code was flawed, it is not to say that the outward behaviour was necessarily wrong, though there were next to no tests to validate it, but that the coding approach was problematic.
This area of the codebase generated complex SQL deployment scripts. This mostly involved writing a lot of hard-coded SQL with only occasional interpolation. The existing approach in the code was to rely mostly on writing the SQL via the Microsoft TransactSQL library. However, our scripts were designed to be run against sqlcmd and contained sqlcmd-specific syntax not supported by the TransactSQL library, so the code generated using the TransactSQL library was concatenated with large sections of hard coded or interpolated strings or calls to TextWriter.WriteLine().
To make matters worse we were generating seven different types of script and six of them were sharing the same entry point and relying on multiple, often obscure, switches in the lower levels of the code to produce the correct output. There were no tests.
The upshot of all of this was a couple of very large and convoluted entry point classes and a lot of very verbose lower level classes producing seemingly arbitrary snippets of SQL, which were used for a variety of different purposes. The cyclomatic complexity for the whole area of code was very high, as was the quantity of parameters and Boolean flags being passed around.
Choosing a better technology
Rather than trying to code out every line, we decided it would be better to use a templating library to generate these scripts. After a little searching around, we opted for StringTemplate (https://github.com/antlr/stringtemplate4). These were the principle reasons for this choice:
- It strictly enforces “model-view separation”. You cannot execute logic in the template, only display variable values or iterate over variables and reference sub-values in the models passed in. You can also include sub-templates.
- The syntax is fairly straightforward and the templates are generally easy to read. It handles indentation well.
- The library is reasonably lightweight
- The library has been designed with the generation of multiple types of output in mind, not just html. Sql is explicitly mentioned in the documentation.
- The library is used by other large organisations
- Terence Parr, the author, is well respected in this area and is also the author of ANTLR (which StringTemplate is built on)
There are many benefits to the model-view separation, as it enforces very clear separation of concerns in the code. As a result, it would be easy to switch to another template library should the need arise.
The main down-side to using a language-agnostic library rather than a library which “understands” the language being emitted, is that you have to be careful with escaping. Our scenario is simple enough that it is not much of an issue, and there is anyway no library which we are aware of for TransactSQL with sqlcmd.
Trying it out, TDD-style
We started by trying out the new approach on the script type that was most self-contained and had its own entry-point. Our approach was incremental and involved many pull requests, and it was only on merging the last of these that the new code became active.
We began by doing some light refactoring until we were in a position to write an integration test for this entry point. This included extraction of the timestamp generation logic behind an interface so that these did not interfere with our tests. We could hardly cover all combinations of inputs at this level, but we tested all the main scenarios, generating output scripts for these inputs and then asserting thereafter that this would be the output generated. Such tests can be a slight irritant to maintain but are invaluable in ensuring that absolutely no behaviour change has occurred.
We then looked at these scripts (with an eye also on the old code) to determine each script section that could be generated. For each section we followed a strict test-driven-development pattern. We created unit tests similar to the overall integration tests, wrote templates and calling code until our unit tests passed, then went back over the old code used to generate these sections in order to tease out any scenarios which had been overlooked, and added more tests and logic if necessary.
Once we had covered all the script sections, we were able to switch the integration tests to run against our new code instead of the old, then switch the calling code to use the new code, and finally delete all the old code which was now unused. ReSharper solution-wide analysis was very helpful in identifying now-dead methods in the complex lower-level libraries.
Seeing it through to the end
All of this was merged and released without a hitch and so we set about the more complex task of the other scripts. We followed the same approach as before. We refactored until we could write some integration tests against the entry-point class. We then worked on each script type in turn, generating integration tests, building up script sections, and switching over to the new code. Where appropriate we would extract or reuse existing templates.
With each script type covered, more inputs and associated lower-level code could be pared away from the original codebase. As our understanding of the codebase grew, we discovered that a number of the scenarios we had initially written integration tests for were in fact impossible and could be removed. We also discovered a number of inconsistencies in the existing code which we were able to iron out.
Finally, we were able to switch over to using the new code for the last script type and kill off the entire area of problematic and unreadable code.
The upshot of all of this was a resounding success:
- We replaced some code which was very difficult to understand and maintain with some far more flexible and readable code
- We discovered and fixed a number of small bugs
- We separated out a number of unrelated scripts and have since been able to make individual changes to them
- Our unit test coverage for the product jumped about 10%
- As a byproduct of this change, this whole area of code is no longer tsql-specific, and adapting this code to support other flavours of sql just involves generating another set of templates
- Our understanding of the codebase hugely increased
- We had also been creating all the new classes in the right location according to our “ideal project structure”, whereas the older classes had been scattered across a variety of projects so our architecture markedly improved as a result.
- We incurred next to no bugs during the entire process