Elm & Guarantees

In this post, I’ll show you a bug I found in an Elm program recently. This bug is interesting because it shows us both how the guarantees Elm gives us really helps in debugging, and where those guarantees end. I’ll compare a bit with Redux/React along the way.

A user of elm-mdl reported that using elm-mdl buttons caused his application state to occasionally revert itself. He supplied this video:

Glitch (Video courtsy of Eelco Hoekema).

The expected behaviour was that whenever the user clicks one of the flat buttons in the bottom, numbers are put in the highlighted row, and the highlight advances. But as you can see, it only works reliably for the first row. Subsequent clicks sometimes cause the highlight to flicker briefly but stay where it is.

We saw this only when using elm-mdl buttons; using standard buttons instead made the glitch go away. What gives?

Had this been a Javascript program and elm-mdl a Javascript library, I’d have had an extended debugging session in front of me. Who knows what silly bug someplace might have trashed the state of something else? But this is Elm & the Elm Architecture; there is only one place state updates can possibly happen: In the update-function of the client application.

So we look at that. In the app, the sub-component “Form.QuestionSet” uses elm-mdl buttons. The actions of those buttons live someplace deep inside the “QuestionSetAction …” action in the code below (simplified for clarity):

Spot the danger?

It looks innocent enough: “QuestionSet” is a subcomponent, which has its actions wrapped in “QuestionSetAction”. To dispatch one of those, we call its “QuestionSet”s update. All standard Elm Architecture so far.

But wait! Notice that “QuestionSetAction” is carrying (line 6) the model argument to “QuestionSet.update” (line 8). So we will be updating not the current model, but whatever the model was when the action was constructed!

When was it constructed, then?

In the view function. So if the first action after rendering that view is “QuestionSetAction state.questionModel”, the captured “state.questionModel” will be current, and the app will behave correctly. If the first action is something else, we will revert the model when the “QuestionSetAction …” happens.

So why are there extra actions with elm-html buttons, but not with elm-mdl buttons? Because elm-mdl buttons have animation and so dispatch actions asynchronously, in this case with delays. The ripple component issues a delayed stop-the-animation action some 200ms after the click. That action is wrapped in “QuestionSetAction state.questionModel”, and when it is dispatched after 200ms, the captured “state.questionModel” replaces the current model (line 10 of the first Gist above).

The guarantees

This bug was easy to find because Elm gives us a guarantee: the only way to change application state is via the top-level update function.

You might think you’d get the same guarantee from React/Redux, but in Javascript there are no guarantees, only good intentions: The language allows mutations and local state. Maybe a React component is uncontrolled or has persistent local state? Maybe some component’s local state is updated asynchronously? Maybe jQuery is confusing React? Maybe my reducer function is not stateless and model state is being mutated from, well, anywhere? Maybe some part of my state is uninitialised?

In contrast, it’s flatly not possible to write an Elm Architecture program which mutates any state from any place other than the top-level update function(*)—it’s not possible to commit any of the above React/Redux errors in Elm. Debugging the present bug is reduced to asking the question “How does update get called with a stale model?”, which is straightforward to work out. Pure languages FTW, baby!

Voiding the warranty

If Elm is so great, how come there was a bug in the first place?

Elm Architecture experts would tell you that you are not supposed to capture your model in actions, like the above program does. Its not “best practice”. Elm as a language cannot enforce this best practice.

But notice how the situation for concurrency in Elm is then akin to the situation for state update in Redux: your reducer function is not supposed to mutate anything. That, too, is not “best practice”. Javascript as a language cannot enforce this best practice.

So that seems to be one limit of Elm’s guarantees: it’s still up to the programmer to get asynchronous computation right.

This limit is not particular to Elm. The Tasks/Effect mechanisms in Elm are conceptually similar to async in F#, Tasks in C#, Futures in Java, Promises in Javascript, Tasks in Elixir, etc. These frameworks all solve the problem of expressing asynchronous computations and basic synchronisation conveniently, but none of them provide guarantees at a higher level than that: They’d all allow us to accidentally make the sequencing mistake of the present bug.

I’m not aware of any language that gives guarantees for asynchronous computations strong enough to statically eliminate the present bug. That’ll have to be in the future ;)


(*) There should be small print here, but we’ll leave that for a future post.