A Tale of Testing: The Basics of Test-Driven Development with a Focus on Data Structures
Put your tests in front…
Learning about Test-Driven Development (TDD) is a great place to start if you want to improve your software development skills.
In this article, we’ll cover the basics of TDD, focusing on data structures to make it a bit more fun 😁. Specifically, we will implement the basic functionalities of a Trie the way that is usually done when applying the TDD principles.
By the end of this article, you’ll better understand how TDD works and how it can help you write better code.
Let’s start!
What is Test-Driven Development?
Let’s start by defining what Test-Driven Development is:
Test-Driven Development (TDD) is a software development process that aims to ensure the quality of the software by running a set of test cases more frequently. This is the opposite of creating the software and then the tests which the software will be tested against.
The philosophy behind this approach is to first implement a set of tests for every new feature being added to the software. These tests are expected to fail as the new feature has not been implemented yet. Once it is implemented, all previous tests and the new ones should pass.
By doing this, you “ensure” that the new feature is correct and all previous ones were not affected by adding this new one. If you do this continuously, every new addition is guaranteed to increase the value of the software.
Another important thing that the TDD approach enforces is that, to write proper tests, we need to have a solid understanding of every new feature being implemented. We cannot afford to know only some minor details of what we have to implement and then go and improvise along the way.
If we do it like this, we might as well compromise the quality of our tests because we can be biased by all the hacks we had to do to make our implementation work. Never write tests in a way that accepts whatever flaw the implementation might have.
Methodology
Let’s summarize the steps you follow on TDD:
1. Add a new test:
- This test should pass if, and only if, all the specifications defined in the feature are met.
- Focus on the requirements before coding.
2. Run all tests:
- The new test should fail.
- Because it fails, it shows that new code is needed for the new feature.
- This will also validate that the testing mechanism is working correctly and that the new test is not flawed in a way that will always pass.
3. Implement the feature:
- The code you write should be the simplest one that ensures the new test passes.
4. Run all tests:
- All tests should pass.
- If they don’t, the code should be revisited, and changes will be made until the tests pass.
5. Refactor as needed:
- Go back and improve the code if needed.
- Rely on the tests to ensure everything works fine after the refactoring.
Without further ado, let’s see an example of a new “feature” to add using this methodology.
The New Feature
Let’s assume we are working on a project consisting of a collection of data structures. We are tasked to add a Trie with the basic functionalities. How should we proceed?
First, we should start by defining what these basic functionalities are. Remember, if we use TDD, we should have a solid understanding of most of the implementation details of the new feature we being added. Pay special attention to the interface that will be exposed to the client code, since this will help you when writing the tests.
In our case, we will have to create a Trie that supports the following operations:
- Insert a new string: A method that receives a string as a parameter and does the corresponding logical operations to insert this new string. It will not have any return values.
- Search for a string: A method that receives a string as a parameter. It will return a boolean value denoting whether the string was present.
By the looks of it, it seems that we can start by thinking our Trie like a class looking similar to this:
# data_structures/trie.py
class Trie:
def __init__(self):
pass
def insert(self, word: str) -> None:
pass
def search(self, word: str) -> bool:
pass
That is, a constructor to initialize some values when we create a new instance of the Trie and the corresponding insert and search methods explained above.
What will this allow us? Well, for starters, we can start writing our tests right away.
Let’s see how!
Adding the New Test
Let’s begin with the first step in TDD. Since we know the interface the Trie class will expose to the client code, we can write tests for these functionalities. An example of how the tests would look using Python is the following:
# tests/test_trie.py
from unittest import TestCase
from data_structures.trie import Trie
class TestTrie(TestCase):
def setUp(self):
self.trie = Trie()
def test_search_after_insert(self):
self.trie.insert("abc")
self.assertTrue(self.trie.search("abc"))
self.assertFalse(self.trie.search("ab"))
This simple test will try inserting a string in the Trie, in this case, the word abc
. Then, it will try to look for abc
and ab
. The expected result is that the first word will be found, and the second will not.
Running All Tests
The second step is to run all tests. The new test should fail because we have not implemented the corresponding methods in the Trie class. Let’s try!
Indeed. When we run our tests, the test_search_after_insert
method fails, as intended. So far, so good!
Implementing the Feature
Now it’s time to make the tests pass by implementing the corresponding feature we were tasked with.
Let’s begin with creating an auxiliary class that will help us implement the methods of our Trie class. Because the Trie is a tree-like data structure, it makes sense to have another class representing a node in this tree.
For that reason, we are going to create the TrieNode class. This class will contain properties and methods such as:
- A symbol property representing the character stored in that node.
- A final property, stating whether the node represents the end of a string inserted.
- An edges property to store adjacent nodes.
- An add_edge method to add a new adjacent node.
- A get_node method to get an adjacent node.
I will leave the details of the implementation for the reader to unravel. You can always go and find documentation online about the Trie. Let’s focus on the TDD part for now.
A possible implementation of this class could be the following:
# data_structures/trie.py
class TrieNode:
def __init__(self, symbol):
self.symbol = symbol
self.final = False
self.edges = {}
def add_edge(self, symbol):
self.edges[symbol] = TrieNode(symbol)
return self.edges[symbol]
def get_node(self, symbol):
return self.edges.get(symbol, None)
With this TrieNode class implemented, we can get started with completing the methods we need in the Trie class:
# data_structures/trie.py
class Trie:
def __init__(self):
self.root = TrieNode("^")
def insert(self, word: str) -> None:
cur_node = self.root
for symbol in word:
next_node = cur_node.get_node(symbol)
if next_node is None:
next_node = cur_node.add_edge(symbol)
cur_node = next_node
cur_node.final = True
def search(self, word: str) -> bool:
cur_node = self.root
for symbol in word:
next_node = cur_node.get_node(symbol)
if next_node is None:
return False
cur_node = next_node
return cur_node.final
Let’s explain a little bit about what’s happening here:
The __init__ method will initialize our Trie with a root node containing the symbol ^
, chosen arbitrarily. The only thing we should consider when choosing this symbol is that it is not part of the alphabet used in our strings.
The insert method will traverse a path from the root of the Trie, inserting one character at a time and creating new nodes when needed. When the string is inserted completely, the final property of the last node traversed will be set to True, indicating that it represents an ending of the inserted string.
The search method will try to traverse a path from the root of the Trie as long as it matches the corresponding character in the string. If at some point, it can’t find a node to go to, it returns False. Otherwise, the string can be a prefix of an existing one, so we need to check for the final property in the last node traversed.
What about now? Will our tests pass?
Running All Tests Again
Let’s run every test one more time to validate the correctness of our newly implemented feature. If we run our tests now, we will get the following result:
This means our implementation is working as we expect. Remember, if our tests fail in this step, we should check for bugs in the code we wrote and try to fix them.
The next step would be to refactor the code we wrote if needed. We will skip this for now, but feel free to tell me how you would improve my proposal.
The Next Feature
Of course, this doesn’t end here. Since we are developing software, it is natural to keep working on new features in our projects.
Let’s assume we want to add new functionality to our Trie data structure. We want it to be able to find if some given string is a prefix of another string already inserted. For example, if the string abcd
is already inserted, we want the Trie to return True for the string ab
and False for the string ax
.
The outline of the new method to implement could be something like this:
# data_structures/trie.py
class Trie:
# Previous methods omitted...
def search_prefix(self, word: str) -> bool:
pass
Let’s repeat the same steps we did before. Let’s do it a little faster since we already know the drill.
Adding the New Test
Let’s add the test that will validate that our new functionality is correct:
# data_structures/trie.py
class TestTrie(TestCase):
# Previous tests omitted...
def test_search_prefix(self):
self.trie.insert("abc")
self.assertTrue(self.trie.search_prefix("ab"))
self.assertFalse(self.trie.search_prefix("ax"))
As you can see, this method will add the string abc
and check that ab
is recognized as a prefix and ax
is not.
Running All Tests
Running all tests now will give the following results:
Notice how our test_search_prefix
test fails, but test_search_after_insert
still passes. This is going as intended. So far, so good!
Implementing the Feature
Now it’s time to write the code with the new functionality for our Trie.
This method should try to traverse a path starting from the root of the Trie, matching the corresponding character in the string each time. If it manages to match them all, then the string is a prefix of another string already present.
A possible implementation could be the following:
# data_structures/trie.py
class Trie:
# Previous methods omitted...
def search_prefix(self, word: str) -> bool:
cur_node = self.root
for symbol in word:
next_node = cur_node.get_node(symbol)
if next_node is None:
return False
cur_node = next_node
return True
Will it work? Let’s see.
Running All Tests Again
If we run our tests one more time, we get the following result:
Notice how the new test passes but also our previous test does. This means we have successfully managed to add the new functionality without affecting any previous ones.
Once again, this will be the time for refactoring, but we will skip it. As I said before, let me know of any ways to improve this code in the comments.
Conclusions
In this tutorial, we have learned the basic workflow of the Test-Driven Development approach. We took an example related to data structures and solved it like we would if we were using this software development methodology in real life.
We learned the value of implementing the tests for our new features first, as we get:
- A better understanding of the feature before we start coding.
- Security when implementing new functionality. Since we have a whole test suite that we can rely on.
- Value in the project. Since every new feature is guaranteed to have a minimum set of tests.
Also, the example I brought here was not made out of thin air. It is an example taken from one of the open-source projects I have been involved for a while. Feel free to look at it here and contribute with whatever you like. We will be glad to review and assess you along the way.
What are your thoughts on Test-Driven Development? Let me know. We will start a fruitful discussion for sure.
See you soon!
Originally published at https://albexl.hashnode.dev.