How I implemented json-to-elm

(5 times!)

Pretext

json-to-elm is a tool that can take many forms of input and generate valid Elm code. The tool is designed to reduce the amount of time developers spend writing decoders in Elm. We’ll discuss the reasoning behind creation and the 5 different implementation passes that took it to where it is today.

This post is intended to help promote some ideas on how to represent complex data inside Elm, in particularly, how abstract syntax trees can be used to create powerful tooling. It also focuses on the “get it done” mantra that I like to follow.

History

In the old days of Elm, nobody really used decoders that much. Nobody was using Elm in production — and so all Json endpoints were either examples, non-existant, or written just for Elm. They would either decode a single field, or just pull out small bits and pieces.

In production, though, your APIs might not always be so nice. In fact, the majority of decoders I’ve seen in production have had more fields that the Core Json library can handle. In times long past, we would use the infix operator |: from the Json.Decode.Extra package. Which means we would have code that looks like this:

decodeModel : Json.Decode.Decoder Model
decodeModel =
Json.Decode.succeed Model
|: ("language" := Json.Decode.string)
|: ("repos" := Json.Decode.list decodeRepo)
|: ("templates" := decodeTemplates)

As I discussed in Json decoding is still hard, this has a number of flaws. It’s hard to refactor, the error messages are no good, you can introduce runtime errors in your code just by reordering fields in your model during refactoring. You can also generate all of this automatically. In fact, I did, by putting

type alias Model = 
{ language : String
, repos : List Repo
, templates : Templates
}

into http://json2elm.com, and it generated the decoders for me. I wrote this while at NoRedInk, and I faced a whole bunch of problems when fleshing out a new feature due to Json decoders. I like to believe that if a task takes a long time and it can be automated, it should be automated. So, I should automate it!

First implementation (Python)

My first goal was to take a JSON blob as input, and generate the type alias, the decoder, and the encoder for that JSON blob. I wrote a working implementation of this in Python in about 20 minutes. You can find it here, but let’s take a look at the interesting parts.

The important thing to realise about JSON decoders is that internally, all they do is check the type of a JS object. For example, the implementation of Json.Decode.int is essentially the following:

var decodeInt = function(possibleNumber){ 
if (typeof possibleNumber === "number"&& !isFloat(possibleNumber))
{
return Ok(possibleNumber);
}
return Err("Not an int!");
};

We just grab the type of the object, check if it’s a number, then check if that number is a float. This is a pretty easy function to transition to Python. Here’s the equivalent code in Python:

def decode_int(possible_number):
if isinstance(possible_number, int):
return Ok(possible_number)
return Err("Not an int!")

Since int and float are different types in Python, we can just check if the object given is an instance of int. This is the first step on our quest to generate valid decoders. If we have something that we know is of the type Int, then the decoder is simply Json.Decode.int. Given that we actually pass our program a JSON blob of key:value, then we can actually figure out which field we want to pull out too. This means that the core algorithm looks something like this:

def make_field_decoder(key, value):
decoder = find_decoder(value)
return f'"{key}" := {decoder}'

Nothing too crazy. For some things like lists, then the find_decoder would need to be recursive — first to figure out that something is a list, then to find out the type of the elements of the list. You can check that out here, where I keep track of the current depth to prevent accidentally blowing the stack.

Some of the downsides with this approach mean that I need to perform the same lookup each time for each key and value, when I want to turn them into type aliases or encoders. In fact, what I decided to do was to turn the code into a type alias first — then parse the type alias and produce decoders/encoders from there.

I also added some extra features, like parsing given files for type alises and generating decoders from them, and turning union types into decoders. At that point, I was pretty happy with my Python implementation. It did everything I wanted and more — but part of making good tooling is helping other people use it! So, on to the rewrite.

Second implementation (Js)

The first implementation was great for me. The second implementation was a line-for-line rewrite in Javascript. The majority of Elm users come from Javascript-based backgrounds, so if your tool is installable on npm, that’s a great advantage to getting people to use your things!

The core algorithm remaned unchained. The idea was still the same — use typeof to figure out what a bit of JSON was, turn that into a type alias, then create decoders/encoders. And then it came to me — this is possible in Elm, too! So, why not rewrite this in Elm? It would have even bigger reach — and it would really help people understand how a decoder was generated.

Third implementation (Elm)

My goal with the third implementation of json-to-elm was to have a visual text input that would take some JSON, and output the generated Elm code. My initial pass involved taking the Javascript for figuring out the type of an object and hooking that up as a native module inside Elm. If a type of something didn’t pass our tests, then it would be Unknown. If was an object of no particular type, it would be Something. I could’ve used some pure Elm here — but manually writing the decoders would’ve slowed me down a bit, and besides — this wasn’t for production.

Later on, I would parse out Something in order to figure out which JSON blobs should become new type aliases. If you give json-to-elm a blob that looks like the following, it’ll give you two type aliases:

{ "name" : "noah"
, "person" : {"age":5}
}
-- generates the following:
type alias Something =
{ name : String
, person : SomethingPerson
}
type alias SomethingPerson =
{ age : Int
}

Recursion is pretty handy, huh?

I showed off this working version during a NoRedInk weekly demo meeting, and everyone was pretty excited about it! The following Friday I met up with Richard Feldman in Prague (or maybe it was Bratislava?), and we spent a good part of the day improving both elm-css and json-to-elm, making it look a little prettier than the barebones version I had put together. Richard also setup a URL so that other people could access it online, and we were good to go.

Refactoring

In the first pass, the recursive type-alias definition wouldn’t work. I had written the parser pretty badly. A lot of my code involved taking a single pass over the input, collecting everything into strings that would then be passed around here and there. This made the code very scary to mess around with, making you keep an entire stack of functions and states in your head in order to understand how generation worked. That’s everything FP stands against, and that’s everything I stand against! But that’s where I ended up. It was no good.

The first refactoring I made was to introduce a type alias. This type alias would represent the name of the alias itself (e.g, Model). It would have a collections of fields and their types, e.g age, int. And it would have a base, that would be used to figure out the name for a nested alias. E.g ModelNested. I also defined each field as a tuple of two strings — just the same as each pair in the dictionary.

The next thing was to switch around the algorithm. Instead of doing a simple first pass, we would try and keep the context for each field we collected. We would first collect all the fields and their types. We’d then filter for type alises, running the whole process on each child type alias. Finally, we’d put them all together into the type-alias that would be returned. The code looked something like this:

This worked to some extent. The main problem I faced was during this transition refactoring, I ended up with re-processing type aliases that I had already processed, meaning that quite often a nested type alias would produce way too much code!

Luckily, I got to pair with Hardy Jones, and we spent a long time just reverse engineering what I’d done. Hardy had never seen the code before, and so watching him try to understand what it was meant to do help me to understand what on earth I’d try to do. We put our cleanups into a commit, and they were mostly either removing debugging helpers, splitting functions out, or using a single type alias everywhere to represent some data.

The thing that really hit me was “let’s not just represent everything as tuples”. Just giving something a name, even if it’s unnecessary, really helped make the code a bit more understandable. This is a pretty obvious change, but when I was thinking about it some more, I realised that actually, an AST might make more sense. And so, it was time for another rewrite!

Fifth Implementation (Elm + Current)

One of the first changes I made was to use a union type to represent all the possible JSON values, instead of just getting it back as a string. This would allow me to really think about the values I was representing, at a type level instead of strings. I’d also be able to represent some of the recursive values in a more logical setting, using two new constructors — ResolvedType, to represent the aliases already parsed, and ComplexType, to represent a type that hadn’t been parsed yet.

With this change in place, it was now really easy to separate out my logic. I realised that while parsing my JSON, I would need to keep access to the current value being parsed. I would also need to keep a better track of the fields themselves — including their type, their name, and their base. This information would allow me to represent as much information as possible inside my parser, prior to turning it into encoder/decoder/type alias/English.

The next thing I thought about was generating the Javascript required to parse the JSON at runtime. This would effectively be the first case of an Elm compiler written in Elm, as it allowed you to take in an Elm decoder, and verify it against some JSON input. You can check out the commit here, but I dropped support for it when the Native module syntax changed. Still a neat little sidenote!

Adding support for decoder/encoder input

During this time, we also created elm-decode-pipeline as a more idiomatic alternative to Json.Decode.Extra’s infix operators. This gave me the idea that I could use the AST I’d written for json-to-elm in order to convert our old decoders to the new pipeline decoders. A short while later, I committed the small amount of changes need to support this. The general gist of it is to figure out the type of input via parsing the text for various clues, and then do various different things depending on the input. This allowed me to take in union types, decoders, type aliases and JSON as input and do a different thing with each one.

Once a decoder was discovered, it would be parsed and converted into a type alias, which would then be used to build up the AST. Once we had the AST, we could generate the decoders just as we were before. This meant could read in old-style decoders, but generate new style decoders.

From there onwards, adding any new feature was trivial. I added a couple of user-facing options, such as which kind of decoder they wanted, along with < 0.17 style original decoders or 0.18+ new decoders, which also allow for upgrading. The latest feature I added allowed for turning a type alias or decoder into English.

All this is due to the AST representation, which allowed me to write a simple function taking a field type and returning a sentence about it.

Conclusion

This project did a couple of things really well. I started with a reference implementation created in 20 minutes, and was not afraid to throw that prototype away in order to make something useful for other people. A lot of my projects end up buried away because I make them the way I like — but actually, when making tooling, you have to make things the way the user likes.

Representing data with a type alias or union type can help you think about the data you are working with, and reduce the cognitive load a new developer has to keep in their head. Keeping an internal representation separate from an external representation allows you to create more external representations trivially, and even swap between the two via the internal representation.

Json-2-elm is a project designed for helping Elm developers reduce the amount of code they have to write. Now, maybe it can help them think about the code they have to write in a new way.

Post text

json-to-elm can be found on Github here. It was a lot of fun to write, probably one of my favourite Elm projects.

Like what you read? Give Noah a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.