Advent of Haskell

Thoughts and lessons learned after using Haskell consistently for 25 days in a row

This December, for the fourth year in a row, a new Advent of Code took place. If you’re not familiar with the event, it’s a twist on the tradition of Advent calendars. As part of a Christmas-themed story, we must help Santa and his Elves, using our coding skills to solve challenges that are made available one each day until the 25th of December.

The challenges are input/output based, meaning participants are free to use any language or approach they so choose.

And you get a pretty ASCII-based calendar to fill out as well. This is my incomplete calendar from 2017.

I had already participated last year using Haskell as my language of choice, but ended up giving up on the sixth day, frustrated by the stateful nature of that day’s challenge.

In retrospect, I guess I was just expecting to be able to deal with all the challenges without having to learn anything new, or make any changes to my very elementary workflow.

This year I was ready to take another go and try to make it all the way to the end. The result has been 25 days straight of non-stop Haskell coding, which also now accounts for most of the actual Haskell I’ve written thus far.

It should be noted that my main focus when solving the challenges was to produce clear, idiomatic, and easy to reason about code, in many cases at the expense of verbosity, taking my best guess at what production code might look like (as opposed to just trying to hack a solution away).

I thought I should reflect on these past 25 days, and try to put into writing some of the main things I’ve learned along the way.

The topics listed are in no particular order.

To put things into context a little bit, I’ve been programming for almost 20 years, and used to code competitively back in High School. I started with functional programming a couple of years back, made my way through Elm and then progressed into Haskell. I did the CIS-194 Spring ’13 course by Brent Yorgey and I’m (very) slowly making my way through Haskell Programming from first principles. I would consider myself an advanced functional programmer, advanced challenge solver and intermediate Haskeller, so the lessons learned should be considered from that point of view.


Development Environment

My Day 1 set-up consisted of a global, outdated, Haskell Platform install on OSX, and my daily nvim set-up with ale for immediate feedback. This provided me with out-of-the-box type-checking on the editor, but due to me also having stack installed, I would get errors when using globally-installed packages.

I used tmux and would keep a ghci prompt running side-by-side with my editor for quick feedback and experimentation.

My changes to this environment were minimal but improved my experience notably.

.vimrc excerpt

I removed all references to stack from the linters, and added hlint. I also added hfmt as a fixer, with fix on save. Finally, I configured some shortcuts for navigating the list of errors/suggestions, as well as displaying the complete messages and not just one line.

I am a fervent supporter for automated formatting tools, so I welcomed hfmt wholeheartedly. I also appreciate the immediate feedback you get when your code jumps into position on file save, a la Elm.

hlint was suggested by a friend, and it has also been a great addition. It feels less like the linting tools I’m used to from JavaScript, and more like having an experienced colleague looking over your shoulder telling you “you know, you could do this instead of what you’re doing there…”.

Types

Use them, plain and simple.

During the first days I barely used the type system. Just a few type synonyms and tuples as anonymous Product types. This works, but not only does not add any additional type safety, it also makes type signatures non-documenting.

After a few days I realized that was not the way to go, and started to embrace types fully. After that, the first thing I would do on every new challenge was start by modeling the problem domain with types. I followed hlint's suggestions to use newtype whenever possible, and continued to use type synonyms only to alias Set and Map definitions.

A bonus of using your own types is that you get to implement your own typeclass instances. In many cases the derived Show, Eq and Ord instances were good enough, but in others I had to provide my own implementations to match the problem’s semantics (Eq, Ord) or to aid in debugging (Show).

I learned that you can provide your own implementation for showList, with a small detour into difference lists and ShowS.

Something I ended up having to decide over and over, was whether to introduce a Maybe-like type (i.e., data Foo a = Bar a | Baz) or just use Maybe. I ended up making the decision based on the expected semantics for that type. For entities that uphold the “may or may not be there” meaning, I avoided introducing a new type, and for entities that provide a different meaning, such as a value that can be finite or infinite, I chose to introduce a new type to capture it.

Integer

Haskell provides arbitrary-precision Integer numbers out of the box, which I like to use as my go-to choice. I am not sure if this is the right choice, but it’s one I decided to stick with.

The problem seems to be that many of the standard/common libs use Int exclusively, leading to a lot of explicit conversion back and forth. My usual approach is still to just follow the GHC errors and add a call to the missing function whenever necessary.

Something I ended up doing is creating a shorter alias: fI = fromIntegral. Another thing that helped avoid altogether this was noticing the final section on the Data.List documentation, which mentions the “generic” operations. This provides alternative versions of many useful list functions but with a Num or Integral typeclass requirement instead of a hardInt.

Parsing

Being an input based challenged, pretty much all days required some degree of structured parsing. In some of the cases I would even consider the parsing itself as part of the challenge (e.g., dealing with expressions with nested parenthesis and such).

Parsing combinators are one of the areas in which Haskell shines, and indeed knowing the basics around them made it so that parsing itself was never an issue for any particular challenge. I used the well-known parsec package, for no special reason.

Again, for context, I am quite comfortable with Applicative and Monadic parsing, and most of the challenges required just the former.

You should try to learn your parsing combinators and functions, as many of the common operations have a handy helper out there. Things like between, endBy1, try, and such.

Now, try deserves its own discussion.

One area I initially could not get to work as expected was alternative parsing, as the early branches were failing having consumed part of the input, which caused the later branches to fail as well.

After a bit of research, I found that the try combinator allows you to backtrack away from a failing parser, thus allowing the alternative branches to proceed as expected.

The caveat here is that backtracking is potentially dangerous, and care should be taken to only use try for processing small amounts of input. What you want to avoid is having to perform an expensive parsing operation only to have to backtrack from it entirely to try the next expensive alternative, only to backtrack from that one, and so on.

A few more small lessons:

Finally, one thing that I’m probably doing wrong: I could not find any function to parse numbers. I ended using number = read <$> many1 digit a lot, but this required upgrading to support negative values, and would’ve required even further changes if we ever had to deal with decimals.

Language Extensions

I picked up a few language extensions along the way.

NamedFieldPuns

As soon as I started using records, this extension came in handy. In general, I would usually be interested in just one value of a record for an update operation, ending up with code like update newval r@Rec{val} = r {val = newval}. For some reason I don’t like using accessor functions. I don’t know why, really.

RecordWildCards

A natural step-up from NamedFieldPuns in my eyes. As soon as I needed multiple values from a record, the previous snippet would end up like update f r@Rec{..} = r {val = f val otherval}.

LambdaCase

This one was a nice find. It does what it says, and lets you turn case expressions into anonymous functions. If you don’t know why this is good, this article showcases it quite well.

Pattern Matching

Abuse it. Especially on lists.

Need to process a list 5 items at a time? No problem, just write process g (a1:a2:a3:a4:a5:as) = g a1 a2 a3 a4 a5 : process g (a2:a3:a4:a5:as), add the process g _ = [] case and you’re done.

You can write down quite specific patterns if it makes unwrapping your data easier. Just remember to always add the required catch-all cases, lest you write a partial function.

Iterate and Fold

Many of the challenges required performing repeated computations. For my non-monadic code, this was mostly done with either iterate or a fold (foldl, scanl, etc.), depending on wether the computations were based just on the previous result, or on the combination of the previous result plus some new piece of data.

This lead to most of my core functions types ending in either a -> a or a -> b -> a, where a is the result type and b the new piece of data type. In both cases, any iteration-invariant arguments would be prefixed, leading to signatures like, for example, k -> l -> a -> b -> a for a fold-friendly computation.

I was able to use this guideline to quickly decide the order of the arguments based on my intended use for the iterative computation.

Maybe

A bit late into the game, but there are four functions in Data.Maybe that have proven to be immensely useful, and if you don’t know them by heart then please just read their signatures until they’re etched in your brain.

They are:

  • maybe :: b -> (a -> b) -> Maybe a -> b
  • fromMaybe :: a -> Maybe a -> a
  • catMaybes :: [Maybe a] -> [a]
  • mapMaybe :: (a -> Maybe b) -> [a] -> [b]

They are all related, so just pick the one that fits your specific situation. Kudos to hlint for constantly pointing these out.

Flip and Infix

Another hlint suggestion, you can replace flip f x with `f` x. This should not be used as a general rule, but I’ve found that it usually makes sense when building up anonymous function arguments for other functions like filter, like filter ((> 0) . (`lookup` someMap)) [keys].

Ordering

Many of the challenges involved multiple ways of sorting or selecting elements from a list based on some numeric criteria. This of course lead to many Ord instances. In most cases, the derived instances were good enough. In others, I had to roll out my own.

In retrospect, there were also opportunities where I could’ve rearranged the type structure so that the derived instances matched my semantics:

It’s a nice party trick, you can use it to amuse your friends.

For simplicity, I opted to implement <=, and for the most part ignored compare and Ordering.

That is, until we had to start combining multiple sort criteria. This is where Ordering pays out. If you look closely, you will notice it has a Monoid instance, which does exactly what we need for this use-case: It combines multiple Ordering values, with the result being the first non EQ value from left to right, or EQ.

Another topic I was able to postpone for a while with the help of paired functions such as maximumBy and minimumBy, is how to sort in reverse order. I’ve found two elegant solutions:

Finally, it should be noted that Data.Ord has a comparing function and that comparing = compare `on`, in case you’re using the latter a lot. Surprisingly, hlint did not teach me that one, I just found it in the docs researching for this article, so you will not find this in my code.

And this does not make on any less helpful, it’s still quite a valuable combinator, so learn it if you have not done so yet.

State

As I mentioned in the introduction, stateful computations had caused me to quit the event the previous year. This time I was ready to approach them with a fresh look.

One of the major mental leaps in my opinion was getting used to looking at a State Foo Bar value as you would a plain Foo -> (Bar, Foo)function value. I have nothing to say here, sadly. Eventually it just clicks.

It’s also worth briefly discussing the return value of stateful computations (i.e., Bar in the example above), as most of the times we are mainly concerned with the state value itself (i.e., Foo in the example above).

I’ve found two general uses for this return value:

  • Functions that take the state value as input to compute a result (i.e., read-only stateful computations).
  • Providing metadata/secondary results from a computation whose main job is to update the state itself.

For this second use-case, the common scenarios I ran into have been:

The next big step was embracing do notation. If you come to Haskell for the >>= operator and point-free style, then this might be hard to do, and indeed you can go ahead and avoid do notation completely. After all, it is just syntax sugar. But it has been my experience that as complexity increases, the imperative-style declarations that do notation enables are easier to understand and reason about. At least for humans.

The final step was monadic iterations. Short of implementing your own combinators (which, as an exercise, I definitely recommend), Control.Monad was a great place to start. Each of those combinators is worth knowing. My next recommendation would be Control.Monad.Loops, which provides pretty much all the remaining iteration combinators you might need.

Finally, as the days went by, I started alternating between State monad and either iterate or a fold. From my experience,State monad wins as far as readability is concerned.

For simple computations that can be reasonably written down as a single function, then of course I would stick to iterate or a fold. But as soon as the complexity increases, I’ll be reaching out for the State monad.

Lenses

By Day 15, my state was deeply nested enough that I had to consider giving Control.Lens a shot. It was the best choice I could’ve made.

Working with records was the least of it. I was delighted to find out there is first-class interoperability with MonadState, meaning we can use the same constructs to reach both into the state of our stateful computations, as well as into the data structure it holds.

Prisms, folds and traversals round up the picture, providing a common language for dealing with Product types, Sum types, Foldable types, Traversable types and stateful computations.

And it all composes and type-checks.

Once more, for context: While I had never used Control.Lens before, I was quite familiar with the concept already, as well as with some of the underlying CT foundations. Additionally, I had previously watched the excellent and thorough introduction by Edward Kmett. It’s about 2 hours long, and I believe it took me a couple of days to finish, as I had to pause repeatedly to properly digest it all.

My recommendation would be to keep it simple at first and focus on some quick wins. Functions like assign and modifying are great examples. Start by using only named functions and avoid their operator aliases until you’re familiar with what they do already.

If you never used the package before, or if you want some examples to base yourself on, the tutorial is a good place to start. I can also recommend the first 30 minutes of this talk by John Wiegley, which contains some simple, down to earth examples. The last 17 minutes have some more advanced examples which, albeit great, will not add much value if you’re just getting started.

Finally, once you can deal at least superficially with the types, start browsing through the documentation. There is just so much stuff there that I think you would need a book to make a serious attempt at explaining it all. I kept most of those links open in different tabs throughout the last 10 days.

Gotchas

To close up the Haskell lessons, there were a few things that I found recurrently annoying.

Number strings can’t start with +

Yeah, read does not like it. It makes sense, I guess, and you can just filter them out.

Decreasing ranges

Again, it seems the default Enum implementations for the base numeric types do not account for this use case, and [10..0] will result in [].

Stale data in State functions

This one took me a while to spot the first time, and yet I ran into it again (albeit this time I knew what I had done wrong).

This snippet looks innocent enough, but it has a major flaw.

At the beginning of each turn, we retrieve all the players from the state. mapM_ will proceed to call playerTurn repeatedly, binding p to a different player each time.

But these values were only retrieved once, at the beginning of the turn.

If playerTurn updates the state in any way, then it is very likely that by the time a future iteration is evaluated, the value in the p argument and the actual value in the current state have diverged, since p will always be bound to a player value retrieved at the beginning of the turn.

Thus, like the code mentions, we must be very careful with p inside playerTurn. My gut feeling is that I’m doing something inherently wrong, but I don’t know what it is. For the time being, my solution has been to do a lookup on the current state, and re-fetch the current value of p, but I’m sure there has to be a better way.

$, (.) or ()

This isn’t really a gotcha but more of an open question. I’m still not sure when I should go for one or the other. I’ve tried to make the code look readable, but I did not seem to find a consistent criteria, especially when there are lots of ways to write the same expression once you start mixing up more than two terms.

Choices. I hate choices.

Non-Haskell

In addition to Haskell, I learned a few non-Haskell related things I did not know before.

Cycle detection

Detecting cycles in a sequence of values is a solved problem, with multiple performing solutions to chose from.

SMT solvers

Many optimization or logical problems can be modeled via SMT and subsequently solved computationally with an appropriate solver. Haskell provides the sbv package, that I’ve yet to explore.

Disjoint-set data structures

Another solved problem, with a backing data structure. There are multiple Haskell packages to try.

Future Work

Finally, there were a few thoughts that I had after the fact, and that I was not able to put into practice. Hopefully by next year they will become lessons learned as well.

Custom modules

I ended up copying over a lot of code, either between all days or between specific days. I guess the idea was to have each day on a single file, but in retrospect there was no real benefit there. I should take the opportunity to already start factoring some code out into some shared modules, both for myself as well as for the next year’s event.

Generic code

I think I did not write a single piece of generic code. I could’ve gone for types with type arguments in many opportunities, and I’ve could’ve used typeclass polymorphism in many places as well, but I chose to stick to the concrete types I was actually intending to use. The benefit here was that I did not have to bother with dealing at the abstraction level, at the expense of reusability.

Related to the previous point, I should keep this in mind when refactoring the common code, as to make the shared modules generic and reusable.

Better types

In many of the challenges I used types that allow invalid states to be represented. This in turn opens the door for a developer mistake to cause such invalid states, and cause either a runtime error or worse: unspecified behaviour.

It does not have to be this way. There are plenty of data structures that allow us to keep certain invalid states unrepresentable. To be considered in the future: Data.List.NonEmpty, Control.Zipper, Numeric.Natural, and Numeric.Positive.

Additionally, some of my uses of Data.List were not ideal, bringing us to the last point.

Performance and strictness

This is one area in which I will need to invest quite some effort. I was able to get all the solutions to run in acceptable time (say, a few minutes in the worst case), but I know I can do better.

One area in which I’m certainly underperforming is by abusing Data.List when I should be considering alternatives such as Data.Array, Data.Vector or Data.Sequence. Which one, and when, I’ve yet to read about.

In addition to this, there’s plenty more to try: strictness, unboxed types, in-place mutation with Control.Monad.ST, etc.


In conclusion, these past 25 days have been a great journey, and I’m very happy that I was able to use Haskell to solve all of the challenges.

Day 3 was a problem I had been unable to solve back in my High School days, and Day 23 required finding the global minimum in a search space of size 1e21. The now-infamous Day 15 challenge involved building the combat and AI engine for a 2D turn-based RPG complete with path-finding, and Day 24 complemented it with the combat engine for a turn-based MUD. So now I can say I have some game development experience in Haskell, and it would technically not be a lie.

I’m sure many of the things I learned are probably wrong, and that there are plenty of better ways out there to do what I did. I look forward to learning them. If you believe any of the things I point out is deliberately harmful for newcomers, please let me know.

I would like to thank Eric Wastl for creating this wonderful event, as well as you for taking the time to read this. If you did not know of the Advent of Code, well, now you do. All the challenges are online, including those from the previous years, and I encourage you all to go out and solve them. Just promise to share what you learn as well. And have a great 2019!