TDD, Algorithms & Emergence – Part 3

Published in

CodeX

9 min readSep 18, 2023

A couple of weeks back we were doing a coding dojo and the challenge chosen by the group was the Word Wrap kata. This has fond memories for me as I have crossed paths with it multiple times in the past. The first time was many years ago when I was designing UI functionality on a payment terminal. This was before I became a TDD convert and I remember it being a tricky challenge back then — one of those situations where you have one last bug before you are finished and in fixing that you introduce another bug somewhere else; and this pattern repeating itself ever more infuriatingly. My next encounter with it was on an Uncle Bob Advanced TDD course. In fact, he writes about it with regard to the transformation priority premise which I discussed previously. It involves another complex algorithm and this tends to be one of the types of problems that leads to one of the bugbears doing TDD — that of the impasse.

You can read more about this in my previous articles (or even better in Uncle Bob’s writing) but just to sum it up — one of the key concepts of TDD is to write the code incrementally. Incremental code means you get regular feedback and ensure that the cognitive load from test to test is small enough that you needn’t have to involve loggers, or debuggers or require copious numbers of retries and code changes before you get a test to pass. In fact, rarely should your hypothesis of how a test works be wrong. When this works, TDD feels elegant and you feel in control. Occasionally though the next test results in an impasse where the only way to get this test to work is to reverse up and significantly rewrite the code you have got thus far. This breaks the rule of incremental development.

Uncle Bob also discusses the need to prevent TDD novices from writing code that mirrors the tests. By this, he means persistently writing specific code that one-for-one matches the tests. After all one of the great expressive powers of software development is to handle a great variety of data and circumstances with the minimum amount of code. The basic building blocks of development help us to genericise our code to handle a variety of cases — techniques such as variables, loops, expressions, functions and classes. Developers use these to maximise reuse and minimise complexity. When doing TDD, every time you add a new test the code needs to handle a wider scope. The skill is to genericise at every step. This is what the refactor step in red-green-refactor is about. But Uncle Bob goes further and talks about the code changes in the green phase as a transformation. These also represent genericising. He outlines a set of transformations you are likely to encounter. For example, a constant evolves to a scalar; or even more intriguingly, an if statement evolves into a while. This is not as crazy as it first sounds as a loop is controlled by a condition and so the if statement is merely a loop of iteration one.

More specifically he orders these transformations in order of complexity and suggests that as a rule, if you stick to the simpler transformations as long as you can, you are more likely to avoid any impasses. I’d now like to postulate as to why this is. It is well understood that code can be reduced to three basic building blocks: sequences of instructions, selection (e.g. on a condition like an if statement), and iteration. Uncle Bob has also demonstrated that a selection is a specialisation of an iteration. So the key pattern when evolving code is divergence through conditions as they lead to loops. With this in mind, think of the incremental code development you do in TDD like the natural growth you would see in a tree. It starts as a single stalk. Then at some point, it branches (i.e. selection). These branches will then grow and may branch themselves; and so on. Now Uncle Bob indicates that you should avoid complex transformations as long as possible. I think this is because a complex transformation represents an abstraction which hides detail, and among that detail may be a new branch of code emerging which could mature into a loop. Prematurely abstracting code leads to these emergent branches being overlooked by the developer, therefore missing a more simple and generic implementation. This then leads to an impasse.

This fits with Uncle Bob’s idea of prioritised transformations but it has one more implication. Whilst unravelling a complex algorithm where you are interested in branches you might have to temporarily break the red-green-refactor loop and drop the refactor step for a while. Why? because the refactor step also introduces abstractions and may lead to you missing an emergent branch.

I’d like to solve the word wrap kata and demonstrate this in action.

Let’s write code for test WhenWordIsLessThanWidth_ThenNoFormatting:

public static string Format(string unwrappedText, int width)
{
   return unwrappedText;
}

This is the kind of code that feels completely pointless to someone who hasn’t done TDD before. But it has fleshed out the interface whilst avoiding distraction with further detail. An important start.

Next, let’s solve for test WhenInvalidWidth_ThenReportsError:

public static string Format(string unwrappedText, int width)
{
   if (width < 1)
   {
      throw new ArgumentException("Invalid Width: " + width.ToString());
   }
   return unwrappedText;
}

Remember each test should incrementally update the code so that it looks like a slightly more detailed version of the previous code, almost like a picture coming into focus. In this case, an error case is one of the simplest to deal with as it is a trivial branch with no dependencies.

Next, let’s solve for test WhenWordIsGreaterThanWidth_ThenFormatIntoTwoLines

public static string Format(string unwrappedText, int width)
{
   if (width < 1)
   {
      throw new ArgumentException("Invalid Width: " + width.ToString());
   }
   string wrappedText = "";
   int processedCharacters = 0;
        
   if (width < unwrappedText.Length)
   {
      wrappedText += unwrappedText.Substring(0, width);
      processedCharacters += width;
  	  wrappedText += "\n";
   }
        
   wrappedText += unwrappedText.Substring(processedCharacters);
        
   return wrappedText;
}

So a bit of a leap this time although it is still incremental from the last test — you can still see the essence of the previous code. The simplest thing might have been to use an if else combination but we have gone for just an if statement and used incremental string formation. Why not do the simplest thing as TDD recommends? The incremental string formation should be obvious — we are trying to develop incrementally. With regards to the if-else, this doesn’t transform whereas an if statement is something that can transform into a loop.

Next, let’s solve for test WhenWordIsGreaterThanTwiceTheWidth_ThenFormatIntoThreeLines

public static string Format(string unwrappedText, int width)
{
   if (width < 1)
   {
      throw new ArgumentException("Invalid Width: " + width.ToString());
   }
   string wrappedText = "";
   int processedCharacters = 0;
        
   while ((processedCharacters + width) < unformattedText.Length)
   {
      wrappedText += unwrappedText.Substring(processedCharacters, width);
      processedCharacters += width;
      wrappedText += "\n";
   }
        
   wrappedText += unwrappedText.Substring(processedCharacters);
        
   return wrappedText;
}

And our first loop has fallen out! That was an easy change. The introduction of processedCharacters in the while loop was actually there in the if statement of the previous code change but because it was zero for that case we didn’t see it.

Next, let’s solve for test WhenTwoWordsGreaterThanWidth_ThenFormatIntoTwoLines:

public static string Format(string unwrappedText, int width)
{
   if (width < 1)
   {
      throw new ArgumentException("Invalid Width: " + width.ToString());
   }
    
   string wrappedText = "";
   int processedCharacters = 0;
        
   while ((processedCharacters + width) < unwrappedText.Length)
   {
      int nextWordBreak = unwrappedText.IndexOf(" ", processedCharacters);
    
      if (nextWordBreak != -1)
      {
         wrappedText += unwrappedText.Substring(processedCharacters,
                                                nextWordBreak);
         processedCharacters = nextWordBreak + 1;
      }
        
      if (nextWordBreak == -1)
      {
         wrappedText += unwrappedText.Substring(processedCharacters, width);
         processedCharacters += width;
      }
       
      wrappedText += "\n";
   }
    
   wrappedText += unwrappedText.Substring(processedCharacters);       
    
   return wrappedText;
}

The code has branched. I have opted for two if statements rather than an if-else again because I think there are more loops to emerge.

Next, let’s solve for test WhenFirstWordIsGreaterThanWidthFollowedByAdditionalWordThatIsNot_ ThenFormatIntoThreeLines

public static string Format(string unwrappedText, int width)
{
   if (width < 1)
   {
      throw new ArgumentException("Invalid Width: " + width.ToString());
   }
   string wrappedText = "";
   int processedCharacters = 0;
   while ((processedCharacters + width) < unwrappedText.Length)
   {
      int lineStart = processedCharacters;
      int nextWordBreak = unwrappedText.IndexOf(" ", processedCharacters);
      if ((nextWordBreak != -1) && 
          ((nextWordBreak - processedCharacters) <= width))
      {
          wrappedText += unwrappedText.Substring(processedCharacters, 
                                         (nextWordBreak - processedCharacters));
          processedCharacters += (nextWordBreak - lineStart) + 1;
      }
      if (lineStart == processedCharacters)
      {
          wrappedText += unwrappedText.Substring(processedCharacters, width);
          processedCharacters += width;
      }
      wrappedText += "\n";
   }
   wrappedText += unwrappedText.Substring(processedCharacters);
   return wrappedText;
}

We’ve added more specificity to the if statement. It feels like we are close to finishing. Just multiple words on a line to go…

WhenTwoWordsLessThanWidthButWithThirdIsGreater_ThenFormatIntoTwoLines

The instinct would be to change the if statement to a while and checkmate, we are done! But that would be wrong. Here we demonstrate the issue of having prematurely abstracted and missed an emergent branch. More specifically with the following line:

processedCharacters += (nextWordBreak - lineStart) + 1;

This actually does two things. It indicates the processing of the word, and it indicates the processing of the whitespace. Because we have combined those two operations into one line, it disguises the next branch. For the whitespace behaviour is about to deviate from the word behaviour. Let’s split this out so we can see it.

public static string Format(string unwrappedText, int width)
{
   if (width < 1)
   {
      throw new ArgumentException("Invalid Width: " + width.ToString());
   }
   string wrappedText = "";
   int processedCharacters = 0;
   while ((processedCharacters + width) < unwrappedText.Length)
   {
      int lineStart = processedCharacters;
      int nextWordBreak = unwrappedText.IndexOf(" ", processedCharacters);
      if ((nextWordBreak != -1) && ((nextWordBreak - lineStart) <= width))
      {
         wrappedText += unwrappedText.Substring(processedCharacters, 
                                          (nextWordBreak - processedCharacters));
         processedCharacters += (nextWordBreak - processedCharacters);
         processedCharacters++;
      }
      if (lineStart == processedCharacters)
      {
         wrappedText += unwrappedText.Substring(processedCharacters, width);
         processedCharacters += width;
      }
      wrappedText += "\n";
   }
   wrappedText += unwrappedText.Substring(processedCharacters);
   return wrappedText;
}

So now we have separated the whitespace-related operation, we can incrementally update the code with a new branch.

public static string Format(string unwrappedText, int width)
{
   if (width < 1)
   {
      throw new ArgumentException("Invalid Width: " + width.ToString());
   }
   string wrappedText = "";
   int processedCharacters = 0;
   while ((processedCharacters + width) < unwrappedText.Length)
   {
       int lineStart = processedCharacters;
       int nextWordBreak = unwrappedText.IndexOf(" ", processedCharacters);
       while ((nextWordBreak != -1) && ((nextWordBreak - lineStart) <= width))
       {
          wrappedText += unwrappedText.Substring(processedCharacters, 
                                           (nextWordBreak - processedCharacters));
          processedCharacters += (nextWordBreak - processedCharacters);
          nextWordBreak = unwrappedText.IndexOf(" ", (processedCharacters + 1));
          if ((nextWordBreak != -1) && ((nextWordBreak - lineStart) <= width))
          {
             wrappedText += unwrappedText.Substring(processedCharacters, 1);
          }          processedCharacters++;
      }
      if (lineStart == processedCharacters)
      {
         wrappedText += unwrappedText.Substring(processedCharacters, width);
         processedCharacters += width;
      }
      wrappedText += "\n";
   }
   wrappedText += unwrappedText.Substring(processedCharacters);
   return wrappedText;
}

And we have a loop! A slightly ugly loop — we repeat the nextWordBreak assessment and we have one operation that is not called as we exit the loop — that is the addition of the whitespace to the end of the wrapped line. But loops are not always nice and clean — that is why languages have while loops and do while loops.

Now that the loops have been teased out, we can do the refactoring we have been deferring so that the intent of the code is clearer. We have two loops. The inner loop deals with a line of wrapped text, and the outer loop deals with the entire text except the last line.

public static string Format(string unwrappedText, int width)
{
   if (width < 1)
   {
      throw new ArgumentException("Invalid Width: " + width.ToString());
   }
   string wrappedText = "";
   int processedCharacters = 0;
   while (!IsLastLine(processedCharacters, width, unwrappedText.Length))
   {
      if (!AdvanceWordsInLine(unwrappedText, ref wrappedText, 
                              ref processedCharacters, width))
      {
         AdvanceLineWidthChunkOfWord(unwrappedText, ref wrappedText, 
                                     ref processedCharacters, width);
      }
      wrappedText += "\n";
   }
   wrappedText += unwrappedText.Substring(processedCharacters);
   return wrappedText;
}

public static bool AdvanceWordsInLine(string unwrappedText,
                                      ref string wrappedText,
                                      ref int processedCharacters,
                                      int width)
{
   int lineStart = processedCharacters;
   int nextWordBreak = unwrappedText.IndexOf(" ", processedCharacters);
   while ((nextWordBreak != -1) && ((nextWordBreak - lineStart) <= width))
   {
      wrappedText += unwrappedText.Substring(processedCharacters, 
                                          (nextWordBreak - processedCharacters));      
      processedCharacters += (nextWordBreak - processedCharacters);
      nextWordBreak = unwrappedText.IndexOf(" ", (processedCharacters + 1));
      if ((nextWordBreak != -1) && ((nextWordBreak - lineStart) <= width))
      {
         wrappedText += unwrappedText.Substring(processedCharacters, 1);
      }
      processedCharacters++;
   }
   return (processedCharacters > lineStart);
}

public static void AdvanceLineWidthChunkOfWord(string unwrappedText,
                                               ref string wrappedText,
                                               ref int processedCharacters,
                                               int width)
{
   wrappedText += unwrappedText.Substring(processedCharacters, width);
   processedCharacters += width;
}

public static bool IsLastLine(int processedCharacters, 
                              int lineWidth, 
                              int textLength)
{
   return ((processedCharacters + lineWidth) >= textLength);
}

So in summary, if you are using TDD to implement a complex algorithm then follow Uncle Bob’s transformation priorities. For example in this case we saw that you should always keep expressions as simple as possible as the next code branch may occur within those components. In addition, you may also want to suspend refactoring until loops emerge. For example, pulling a function out might abstract detail away which means you don’t spot an emergent branch.

I’d like to try this out on further complex algorithmic challenges to see if it prevents impasses. Let me know if it helps you too.

Originally published at https://www.linkedin.com.

TDD, Algorithms & Emergence – Part 3

Written by Gary Blair