Proof-Driven Development (or — the business value of Clean Code)

Arnon Axelrod
Oct 8, 2018 · 10 min read

In the Algorithms course I took at college (back in the late ‘90s…) we’ve learned to prove the correctness of certain algorithms. It seems though that in the industry, as opposed to academia, proving the correctness of an algorithm or an application is very rarely heard of. Instead, in the industry we’re using testing to ensure that the application works correctly. However, testing, by its nature, is based on specific examples and therefore can never prove that the application works correctly in all cases, rather it can only prove that it doesn’t if a test fails.

So Why No One Do That?

The most common answer to this question is that real software is much more complex than any of the algorithms that we study in college. In a sense this is right, but it seems like because software became so complex over time, that many of us treat software as similar to something like biology where the behavior of things cannot be fully and precisely predicated, and forget that these are still computers and that they are deterministic! It’s true that inputs and their timings, including those that our system receives from other systems, and including those we use as the seed for random generators, may not be deterministic, but these are things that we can account for and handle properly if we’re aware of them. Unlike biology, for a given software and a given set (or sequence) of inputs, we’ll always get the same results.

But in fact I find that as programmers we are using proving techniques a lot! The problem, as I see it, is that we mostly do it unconsciously and therefore not thoroughly enough. Here’s a simple example:

You can easily and intuitively prove that for every value of x between int.MinValue and 0 (inclusive) the method prints the value of x+1, and for every value of x between 1 and int.MaxValue it prints the value of x-1. We don’t even have to run it in order to be sure that these claims are correct! (I won’t go into very formal proofs here, as it’s not the main point as I’ll explain later).

Now, of course that this is very simple method…

Clean Code vs. Spaghetti Code

Real world systems are many orders of magnitude more complex than the above example, so it’s indeed not feasible to prove (or even “see” intuitively) that they’re correct in this way. But if the application is built in a modular fashion, with clean architecture and clean code, and is made mainly from small classes and methods glued together, then it’s fairly easy to prove the correctness of those individual classes and methods. Given that you proved the correctness of these individual classes and methods, then you can usually prove claims on broader scopes of classes and scenarios too.

For example, suppose that we have an application that manages the grades of students in a class. This application is composed of a web client, web API, database and a business logic component. Whenever the teacher adds or updates a grade of a student, the business logic component immediately calculates his GPA and updates it in the database. When the student enters his personal web page, he can see his GPA. Now, if we can prove that when the teacher adds a grade to a student it is stored in the database and the business logic component is invoked; and that the business logic calculates the GPA correctly and updates it in the database; and that the web API correctly retrieves and returns the GPA; and that the web client correctly sends the request to the API and correctly displays the returned GPA in the right place; then we can conclude that the entire workflow is also correct.

This is of course the “happy path”, but we can do just the same for “negative scenarios”, as long as we determines how the system should behave in those negative scenarios.

However, if the same application was built as one monolith of spaghetti code (as opposed to the described modular components; not necessarily in the sense of scalability and micro-services as often used these days), then it would probably be very hard to prove that the described scenario behaves correctly. Testing can tell you whether it’s working correctly in the tested scenarios, but as mentioned before, cannot tell that it works correctly in all cases.

The Business Value

When the code is clean and modular, because you can easily prove the correctness of individual classes and methods, then the chances for bugs hiding in that code is significantly lower than the chances for bugs hiding in a spaghetti code. And the more bugs you have the more time it takes to manage them. Besides of the time it takes the developers to fix them, we need to prioritize them, reproduce, elaborate and discuss them with the developer, and finally verify the fix and close the bug. Let alone the negative business impact that bugs in production can have…

But if you also take the developer’s time into account too, then the problem is much more severe: not only the amount of bugs goes up, but also the time it takes analyze and fix them. While in a clean code base you can pretty easily reason about the possible cause of the bug, in a spaghetti code, your only reasonable choice is debugging. And debugging spaghetti code typically takes orders of magnitude more time than reasoning about clean code! Moreover, those long debugging sessions can be very confusing and misleading, and can make you fix the wrong thing and even to introduce another bug instead.

Therefore, I see long debugging sessions mainly as a symptom of a bad coding practices, and I encourage teams that want to improve to measure the time they spend debugging. Of course that they shouldn’t try to minimize the debugging time directly, but this should be the consequence of improving the design, their coding and refactoring practices. Tools like Codealike can help you measure the time you spend on debugging, and you can correlate it with other metrics of clean code like those that SonarQube provides to get a clear picture of how clean code affects your team’s productivity.

Should I Prove Everything Formally?

Probably not. Most people can reason about claims pretty accurately without using writing down a formal proof. The amount of time you’ll need in order to formally prove everything that you want is probably much more than what you can justify. Especially because the code will probably need to evolve, meaning that you’ll have to maintain these proofs too. It helps though if you do formalize for yourself what you’re trying to prove, and even if you don’t write it down in a formal way, pay attention that you cover all possible cases. This is one of the reasons that TDD works: it forces you to think of what you’re trying to prove and of all of the possible cases in advance. Anyway, a peer review, either of the proof or of the code itself, is essential to ensuring that your proof or the code has no flaws.

So Does it Mean That We Don’t Need Testing?

Certainly not! First, most of the code I encounter in the industry is pretty far from being clean-code, unfortunately. But even if it was, we need testing, both manual and automated for several reasons:

  1. Exploratory testing — while we may be able to prove that the code behaves according to the spec, or prove specific traits about the code, we can rarely prove that the spec itself addresses correctly all possible cases in the real world. For that, we need manual exploratory testing and sometimes also customer testing or collect and analyze production metrics.
  2. Things that are difficult to reason about — the best example is performance. While it’s reasonably easy to measure performance, it’s extremely hard to calculate and reason about the time an operation can take, especially in a non-real-time OS (which is what the vast majority of us use). There are other cases where we can easily define constraints that we want our software to adhere to, though it’s difficult to prove that the code doesn’t violate them. Thorough testing can give us enough confidence without proving in these cases.
  3. Simple mistakes — when we write code and also when we read it and try to prove its correctness, we may make mistakes. While it’s less probable that a simple mistake won’t be caught in a peer review, it may still be the case, mainly due to conformation bias. For example, in the above example, replacing ‘<’ with ‘>’, or ‘+’ and ‘-’ are small details that we may miss. Tests can usually detect these pretty easily though. Theoretically, verifying each scenario once is sufficient (assuming we cover all code paths), and there’s no justification to write test automation for that reason. But because we also expect our code to evolve, then running these tests automatically on every change will give us confidence that it’s still correct. The next reason expands on this.
  4. Change — because applications and their code base evolve over time, then even if we keep our code base clean, when we develop a new feature it can be pretty hard to determine that we haven’t broken any assumption that we used to have in an a functionality that we’ve developed long ago. Let me put it this way: when we first developed the old feature we proved that it’s correct. Now, when we develop the new feature we can also prove that the new feature is correct. But the new feature or an implementation detail of it may have violated an assumption that was correct when we proved that the first feature works correctly. The point is that while it’s reasonable to prove the correctness of the new features we developer, it’s not realistic to go back and validate all of our previous proves on every change! While automated testing is not bullet-proof as formal proofs (no pun intended…), automated testing that validate the acceptance criteria of every user story we’ve ever developed help us ensure that are previous assumptions still hold. If the code is clean, then when we violate an old assumption and an old acceptance test fails, then we can easily reason about the cause of the failure and quickly fix it. In my opinion, this is where the value of test automation really shines.

The Importance of Clean-Code and Refactoring Skills

Hopefully by now I convinces you that because clean code make it easier to reason about your code, then it’s less prone to bugs and considerably shortens the time spent in debugging sessions. But this begs the questions: how do I keep my code clean over time? And maybe also — what is considered clean-code anyway? So for the definition of Clean Code, I suggest that you read Uncle Bob’s book with that name or just search the web. And for the first question, the answer is — refactoring (which you can also read about in Martin Fowler’s book). But while these books are great, the trick is to practice it and get better and better in these techniques, and also share this knowledge and skills among the team member through pair- or mob- programming. If you’re interested, I also provide formal training, consulting and accompaniment for teams in these areas (you can contact me through LinkedIn).

The Importance of Poka Yoke

Clean Code encompasses many good practices and advice, but there’s one very important advise that a bit less talked about: Poka Yoke. This Japanese words roughly mean “preventing mistakes”. In code, things like declaring variables as readonly (in C#) or final (in Java), the use of static and strong typing (which dynamic languages like Python and JavaScript lack), correct use of private and public, etc., are all things that you can do without, but if you do use them correctly, then beside of preventing possible mistakes, it also helps you prove the correctness of your code, because it eliminates cases that you would have need to account for otherwise. As mentioned above, this can reduce your debugging efforts and save you time and money.

Side note: Testing as Scientific Experiments

While often I teach people to treat their tests like scientific experiments. What I mean by that is that it should be designed in a way that the test should fail if the claim that we want to verify is incorrect (i.e, the test can prove that the tested feature doesn’t work correctly, if indeed it doesn’t).

There’s one important difference to note though. In any scientific domain, except of math and computer science, experiments are used to “test” theories, and are similar to testing in the sense that they can never proof that a theory is correct, rather they can only prove that it’s not. However, the whole concept of science is about using mathematics to model real-world observations, that we try to get a better understanding of. The mathematical models (theories) aim to explain the real-world observations in a deterministic way, and the experiments try to either corroborate or refute the theory. In software, however, we create a mathematical model (which is the computer program) according to a specification that we define ourselves. Because computers are deterministic (unlike most phenomena in the real world), then as long as we logically prove that the code adheres to the specification (like I learned to do in college), then there’s no need to create experiments to corroborate or refute that. Theoretically speaking, we don’t need to test it, because we determine how the system should behave, and not some physical laws that we don’t fully understand do. Note that often the specification attempts to model the real world (usually a business domain), and if it fails to do that correctly, then the software may be deemed unusable. For this reason, testing is still important, but this kind of testing shouldn’t treat the specification as the “truth” (as we often do), but rather to examine whether the spec adheres to reality, which is what exploratory testing is about.

More From Medium

More from Arnon Axelrod

More from Arnon Axelrod

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade