Building a JSON parser with TDD
Recently a friend was in need to parse a file that has a custom format, and asked me how would I implement one. My mind was immediately flooded with memories from my college classes. So I answered him that I would first make a lexical analyzer that would consume the text file and produce tokens, and then I would create a state machines that would have the tokens as events for state transitions, and each transition would build a little bit of the parsed result until it reached a terminal state.
I’ve being interested in TDD for quite some time, but did not have a chance to try it out, so this looked like a good opportunity to make something.
In this series I will try to create a JSON parser using TDD, and I will try to share my though process in every step. Let’s learn together :)
Before we start, some thoughts:
- I’m writing both the production code and test code in a single file to be easier to switch between them.
- We will start by parsing only JSON files with no whitespaces and newlines. We will deal with them later.
- I know that the Foundation library has a very optimized JSON parsing library and we have JSON parsing covered with the addition of Codable protocol in Swift 4, but I’m doing this mainly as an exercise.
- For those who are not familiar, I will try to follow the three rules of TDD:
You are not allowed to write any production code unless it is to make a failing unit test pass.
You are not allowed to write any more of a unit test than is sufficient to fail; and compilation failures are failures.
You are not allowed to write any more production code than is sufficient to pass the one failing unit test.
We start by creating a cocoa touch framework project with unit tests, deleting the default tests
The first test should always be the for the most degenerate case. Since we’re not passing optionals anywhere, the most degenerate string we can create is the empty string. For now let’s just say that if the parsing fails, we return nil.
Now we are in the red phase, and we must make it green again by making this test pass. First error is that JSONParser does not exist. So we create it. Let’s make it a class. Now the call for the parse method does not exist. We can fix this by creating a method parse that takes a string and return Any?. Empty string should fail since it’s not a valid JSON, so we just return nil.
We run the tests and… pass! Great! We’re doing progress!
In the JSON RFC it says that a JSON can be one of the following: Boolean, Number, String, Array, Object. Let’s start our parser with the simplest: a boolean.
First, we will address the true case:
We did a force cast here since we’re receiving Any?. I’m not bothered much in using force casting in unit tests since it will fail the test if it the cast doesn’t work.
We run the tests, and of course… we fail! Welcome to red zone again.
TDD produces better results by doing things the simplest way, so let’s try the easiest approach to validate if a value is true.
We run the tests again and… pass! Feels good, right?
This time we have the opportunity to do some refactor in our test code. The parser assignment is starting to repeat, and it will probably repeat in every single test. Looks like a nice opportunity to move it to the setup function. Then we have:
Looks better. Let’s move to green phase again.
Next on the list is handling the false boolean value. We test the same way as the true case:
Tests of course fail. Move to red!
We did an if to check if it’s true. Let’s try to use the same approach and “evolve” the if to a switch statement.
Looks simple but ugly huh? Of course tests pass, so we escaped red once again.
How can we improve this? We need a way to convert “true” to true and “false” to false, and we cannot forget to return nil if the parsing fails. Wait, there is a function that does exactly that! It’s called… Bool.init!
Wow! We could not have it simpler. But does it work? Let’s test it!
Pass!! It feels good to have some tests to rely on.
Now that we’ve handled boolean values, let’s move to another type. Strings seems complex with all those double quotes and everything. Arrays and Objects looks even worse. Let’s try numbers then. JSON definition of number applies to integers and floating points, but let’s start slowly with int parsing.
Tests fail miserably. They were not prepared.
We have grown smart after the boolean case. Ints are going to be easy. Let’s use Int.init and laugh at the failing test.
We try to parse as a Bool. If it succeeds, return the Bool. Then we try to parse as an Int. If it succeeds return the int. If it does not parse, it doesn’t look like JSON to us, so return nil as a failure.
We run and tests and they pass!
Even though it looks good, I don’t like having if lets and guard lets everywhere in my code. Maybe we can simplify it a little bit. Bool initializer returns an optional Bool. We want to call Int initializer only if bool initializer fails. Is there a function that is called only if an optional is nil? There is! And it’s even an operator!
Welcome to single line coding. Our parser can handle two types of JSON values, and it’s written in one line! And now it’s backed by a bunch of tests. Which of course pass after our latest changes.
We’re on fire! I had a lot of fun so far with this little parser, and I’m sure I’m going to have lots more until it’s done.
The final code is available on GitHub
Each TDD step was done in a single commit, so you can follow the progress by comparing the commit diff using your favorite git tool.