Why building web-apps in Haskell is harder than it ought to be
After having written and maintained a 200,000+ LOC project in Ruby/Rails, I was beginning to see the shortcomings of a dynamically-typed language in large projects. I started looking for alternatives. And got fascinated with Haskell, with its focus on immutability and managing side-effects.
And so my journey with Haskell began a few months ago.
And, boy, was it an eye-opener. I loved that the compiler was helping me write correct code.
However, it was not all rainbows and unicorns when it came to writing real-world web-apps. There were a number of areas that I got stuck and couldn’t find any acceptable solutions. And these are not the “I don’t understand monads” kind of issues. (I still hate monad transformers, but I’ve accepted them as a necessary evil).
Here’s the laundry list I have till now:
- Which framework? Which library? Too much choice?
- How to deal with values that are specified by the database, eg. created-at, updated-at, etc?
- How to deal with nested records/tuples?
- How to pass filters, order-by, limit/offset, etc. to a lower-level function? How to compose Esqueleto queries?
- How many different names can one come-up with for the same thing? The “ambiguous reference” problem.
- How to deal with validations?
- Why are templates so hard (specific to Shakespearean templates)?
- The “we don’t need ORMs, they are bad” mentality
- Lack of testing infrastructure: TODO
Which database library? Problem of too much choice.
As a newcomer to the eco-system, the lack of a “mostly works, batteries included” framework leads to an overwhelming experience. You are forced to make choices that you have no knowledge about. Each of the database libraries listed below makes different design decisions, which, as a newbie, I cannot weigh. Finally, I went with the framework which seemed to have good documentation, was talking about type-safety (my main reason for learning Haskell), and seemed full-featured. Yesod + Persistent + Esqueleto.
The points given below are with reference to Persistent + Esqueleto, but I have a strong feeling that every other DB library in Haskell has similar problems (please correct me if I’m wrong — I’m more than happy to be corrected).
How to deal with values originating from the DB
This was the very first problem that I ran into, and by the looks of it, so have other people. There are values/columns in a table that you expect the database to provide, for example:
- Primary key
- Created-at timestamp
- Updated-at timestamp
These values cannot be modelled as “Maybe” in Haskell, because they cannot be “Nothing.” In fact, here’s how the DB schema would look:
It’s just that the values are unspecified till the time the record is saved. As soon as the record is saved (or has been fetched from the DB, instead of being created in Haskell), these values have to be present. There is no easy way to model this in Haskell. Persistent seems to have solved this problem only for a part of the problem — primary keys. This is done by the clunky (IMO) “Entity” type, where the primary key (id) has been “pulled out” of the “val” record.
Entity id val
With this type being central to Persistent, I, as a programmer, have to constantly pattern-match the Entity constructor. Why can’t this be simpler? Something like:
How to deal with DB associations / nested records / nested tuples
Any real-world web-app built on top of an RDBMS will need to model relationships between DB tables. Something as simple as:
Now, when you’re fetching a user-object, you don’t usually fetch only the user row. You fetch the user along with some associated row(s). Eg:
I don’t think this is possible in any DB library in Haskell. While Esqueleto provides a DSL to perform joins and returns tuples of the following form, it has no solution to return a programmer-friendly nested tuple/record:
[(Entity User, Entity Post, Entity Comment)][(Entity Vote, Entity User, Entity Comment)]
I have already raised this question on Esqueleto’s Github repo, and thanks to Tom (Opaleye’s author), I have a workable solution, but I would not call it easy.
In my opinion, the ideal solution is to NOT deal with ad-hoc nested tuples — they’re extremely brittle and hard to work with (you have to keep going back to the tuple definition to check the order and nesting-level of elements). A better solution would be for the DB library to be aware of the relationships between tables, and automatically fill-in the nested record that is being expected. For example:
How to build a domain-level API
The API provided by Persistent is too low level and lacks a number of features that any serious web-app would take for granted (createdAt/updatedAt timestamps, audit-logging, lifecycle callbacks, in-memory change-tracking, etc.) Therefore, there exists a need to build a domain-level API on-top of Persistent+Esqueleto. However, it is not clear how.
Here’s an example I grappled with a few days ago. I needed a low-level, but reusable, function to fetch a list of nested-objects (Downloads along with associated Files along with associated URLs), given certain filters and ordering criteria. How do you write this function?
This compiled just fine. But I had no clue how exactly to build and pass the “whereClause” and “orderBy”. So, here’s what I tried next:
And I couldn’t get it to work, because I didn’t know how to pass a “no-op” orderBy. So, here was my third try:
And by the third try I was convinced that either I was doing something terribly wrong, or this was indeed way harder than it ought to be. I still don’t know how to write a domain-level API on-top of the building blocks provided by Esqueleto and Persistent.
Another related problem was how to “chain” (or compose) two Esqueleto statements. Should I be returning an SqlExpr and letting the higher-level function chain a whereClause and orderBy to it? How exactly is this supposed to be done wasn’t very clear from the docs.
How many different names can one come-up with for the same thing?
With its inability to deal with overloaded field names, Haskell forces you to name the same thing over and over again (or unnecessarily namespace it with modules to avoid name clashes). For example, take a look at the DownloadFilter record I defined above. I can’t put it in the same module as the core Download record because of the ‘downloadStatus’ field.
Another example: Download is the name of the record mapping to the ‘downloads’ DB table. However, in most of my code, I’m dealing with a nested tuple of type (Entity Download, [(Entity File, [Entity URL])]). What do I call this tuple? DownloadObject? DownloadTuple? What if I want a new data-type that deals with only (Entity Download, [Entity File])? What do I call this?
Another example: You have a record field called ‘status’. If you’re using OverloadedRecordFields or Lenses, it will result in a ‘status’ function or a Getter/Setter respectively. However, what do you do if you need a variable called ‘status’ and to you, as a programmer, the call-sites are enough to disambiguate which ‘status’ you’re referring to. You can’t do that. You have to come up with another name for ‘status’, something like ‘status_’, or ‘statusWanted’, or ‘targetStatus’.
How to deal with validations
Honestly, I’ve spent so much time grappling with DB-related issues that I haven’t spent enough time dealing with validations. However, just a cursory glance leads me to believe that it is not going to be easy. The lack of domain-level API in any sample code leads me to strengthen this belief. Coupled with the fact that no DB library solves the nest tupled/records problem AND the Haskell community as a whole is divided about how to deal with errors/exceptions.
(If you don’t solve the problem of representing nested tuples/records, I don’t think you would be solving the problem of validating nested tuples/records).
Why are templates so hard?
Again, this is related to Shakespearean templates alone, because of the very first choice I made (Yesod + Persistent).
The two biggest pain-points that are making templates absolutely unusable from a programmer’s point of view are:
- Lack of editor support. Especially because of https://github.com/CodyReichert/shakespeare-mode/issues/7 For a templating language where white-space is significant, having an editor which is constantly screwing-up your indentation is a no-go.
- Lack of sensible error messages. Instead of telling the programmer where the error in the Hamlet template is, the compiler simply points out the error in the line which is splicing-in the template. Makes debugging effectively impossible for large templates.
Why are ORMs a taboo?
Here’s a tongue-in-cheek reference to Greenspun’s Tenth Rule:
Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.
My own corollary to this is the following:
Any sufficiently complicated RDBMS-backed app, contains an ad-hoc, informally-specified, bug ridden, slow implementation of half of Active Record (or insert your favourite ORM here).
I’ve genuinely tried learning how to build a mid/large-scale RDBMS-backed app without an ORM, but failed. Most of the literature out there, goes on and on about the so-called impedance mismatch, but fails to provide any solution (beyond trivial use-cases) of how to do build apps without an off-the-shelf ORM, or without reinventing an ORM on your own.
My experience with ORM is with Rail+Active Record. And it’s a fine piece of software. It has its limits, and when you hit those limits, it is better to go under the hood and write SQL instead. And that’s the case with any abstraction out there. Doesn’t immutability have its limits? Isn’t it more performant to write certain code using mutable data structures? Doesn’t purity have it’s limits? Doesn’t Rust’s borrower-checker have its limits?
Most of the anti-ORM literature out there argues about the ill-effects of ORMs with references to cases where the abstraction breaks down. And they generalise it to: “ORMs are bad.” But what about the boilerplate stuff that ORM gets right? What is the alternative to all this in the non-ORM world?
- Fetching a row by a primary key and automatically mapping it to a native object/dictionary/tuple/record without writing any boilerplate
- Fetching a collection of rows, given a filter criteria, and automatically mapping each row to a an item in a collection of objects/dictionaries/tuples/records without writing any boilerplate
- Saving a native object/dictionary/tuple/record back in the database without writing any boilerplate
- Understanding the one:one, one:many, many:many relationships between your tables and performing a sensible JOIN-query and mapping the result to a nested object/dictionary/tuple/record without writing any boilerplate
- Implementing limit/offset pagination sensibly without writing any boilerplate
- Implementing limit/offset pagination in JOINed queries sensibly without writing any boilerplate
- Updating only those columns of a table that have changed (to reduce the chatter with the database) without writing any boilerplate code
- Maintaining createdAt/updatedAt timestamps for each object/row without writing any boilerplate code
Can all of this be done without an ORM library? Sure, it can. But, then aren’t you implementing the ORM on your own? Should you be?
Lack of testing infrastructure — TODO
Correct way of building fixtures for databases where tables are associated? Correct way of wiping out the DB between tests? Should one be using QuickTest for app-level testing? http://stackoverflow.com/questions/38644779/how-to-use-quickcheck-to-test-database-related-functions
I was enjoying writing Haskell till the time I was dealing with JSONs and talking to the Telegram API. But as soon as I wanted to deal with the database, write HTML templates, or process forms, I was hitting a lot of friction.
Either I’m trying to map patterns/solutions from the Rails world, which don’t map cleanly to Haskell. (In which case I’d like to learn the correct way of doing this in Haskell). Or, the kind of stuff I’m struggling with, hasn’t really been solved well in Haskell.