My “Haskell In Production” Story
This is the story of the problem that led to this tweet, some challenges I faced along the way, as well as the outcome. More experienced Haskell programmers may shrug because they know what the language is truly capable of, but these are the types of challenges that commercial users of Haskell have to deal with and overcome, especially when betting on the language for the first time.
There intentionally aren’t many technical details in this post. I may write up a few deep dives separately.
Unlocking Business Data
Salesforce is the business system of record for much of my company’s data. That makes it an important system but unfortunately not an easy one for our own SaaS platform to interface with day-to-day. Fortunately, most of the time our backend developers just need to search or read the values of a few Salesforce objects. Bi-directional integrations, those that require reading from and writing to Salesforce, are rare. Even still, untangling the Salesforce APIs, dealing with authorization issues, and handling integration errors can feel like a huge burden when all you want is a little bit of data.
It is also interesting that most of the time we don’t need live data. We need a reasonably up-to-date view, but if this lagged Salesforce by a few minutes that would be fine. This intuition suggested that we could look for a reliable way to replicate data from Salesforce into our own backend. Given a read-only replica of Salesforce data in, say, a PostgreSQL database, we could query and work with the objects in a more natural way. Our backend developers would still be exposed to the Salesforce data model but they could stay focused on the business logic within their own code. They would not have to worry about the technical details of API integrations. This reliable data replication capability would just become a foundational building block in a larger solution.
As simple as it sounds, there are still surprising complexities in building this.
In order to replicate Salesforce data, we need to use two of their APIs: the “async” API and the REST API. The former provides batch-style access to bulk data, useful for initial replication or rebuilding replicas after critical errors. Among other things, the latter provides near real-time notifications of updated and deleted records, useful for keeping replicas up-to-date.
Both APIs use security tokens for authorization and these must be periodically obtained via OAuth2.
Both APIs return JSON representations of the underlying Salesforce objects, but, as we’ll see, there are some subtle differences between the two representations and these have to be reconciled.
Our backend developers may not need live data but they do need confidence that the data is correct. Replication error handling needs to be well thought out.
Finally, systems tend to break at their integration points so we need good monitoring and visibility into status and whatever errors do occur. This integration is an important production system in its own right.
Why Haskell?
Golang is our de facto standard backend language. I made this choice a couple of years ago. We standardized on Go before starting our SaaS development. All of our backend services are written in Go and it is a popular choice in other areas, e.g. for CLI tools and utilities. Go would normally be the automatic choice for a problem like this. As much as I’m not a fan of the actual language, I admit that it is easy to be productive in the Go ecosystem and when thinking about this problem I felt I could see the solution in my mind’s eye. It would be no big deal to write and maybe kind of boring.
But I’ve had an interest in Haskell for a long time, for over a decade. I got interested in Erlang around 2006, discovered functional programming, and read somewhere that, “if you really want to understand functional programming, then you should look at Haskell”. So I did, and I never went back to playing with Erlang. I’ve used Haskell for personal projects but never at work.
I think Haskell is an interesting tool for a variety of problems. The story has improved greatly since 2006 and the language is now a viable, although niche, choice for commercial use. When I had the chance I was sorely tempted to standardize on Haskell over Go, but for a variety of internal reasons I could not responsibly make this choice (this wasn’t just a choice between those two languages — I also considered a couple of others before ultimately going with Go).
I’ve been keen to give Haskell a try for a “real” project, so when this problem showed up it seemed like it might be an opportunity to make the work more interesting.
At first glance, this problem is not an obvious fit for Haskell. It is very IO-oriented: making OAuth2 requests, moving data around from one system to another, implementing retries, and handling various forms of errors. This is the kind of “real world” code that jams people up in the language. But I’ve also heard enough times that Haskell can be a tremendous asset in the real world so I decided to give it a try.
Yes! Haskell Can Do That!
During development I was stretched well out of my Haskell comfort zone, but I never regretted the choice. Not surprisingly, IO- and error-related issues were the biggest challenges.
True to what I’d heard, Haskell did turn out to be an asset rather than a liability:
- HTTP
GET
requests are naturally idempotent, so we’d like them to be automatically retried. Regardless of HTTP method, we’d like a retry strategy that takes into account network errors, HTTP response errors, and timing (for exponential backoff). This is completely hidden from client code via monadic retry combinators. - We’d also like client authorization to just work. When we don’t yet have a security token, OAuth2 must be used to obtain one. Like any web request, OAuth2 requests can fail due to transient issues. If we already have a security token but it has expired, then we only know this because we receive a
403
response from another API endpoint (sadly, Salesforce’s implementation of OAuth2 does not return the RFC-recommendedexpires_in
property), so we need to obtain a new one and then retry the original request. This scenario is transparently handled within the automatic retry strategy. The resulting code is concise and easy to reason about. - Salesforce’s implementation is unique in that the OAuth2 token responses give you the base URL that you must use for subsequent API requests. This can and does change over time as Salesforce migrates your instance around. Consider that a long-running process may start with an initial security token that directs it to instance A, uses that for a while, gets a
403
response, obtains a new token that directs it to use instance B, and then completes the retry there. But that’s a totally different URL in the retry. Again, this is handled within the retry strategy and is hidden from client code. - When using the Salesforce REST API we may receive a set of incremental object updates in one API response. If one of these objects contains an error that “should not occur”, we’d like to be able to discard the whole set at once and then retry processing the set again later. The Salesforce REST API facilitates this. Similarly, if we fail to write the entire set to our SQL database, we’d like an assurance that no updates from the set are lost during reprocessing. This is a rare occurrence and is difficult to test. By accounting for errors at the type level (running a conduit over a monad transformer stack with
EitherT
) we have a way to reason about how errors are handled whenever they do occur. - Speaking of errors, we use Prometheus to collect operational metrics. I was happy to find that Haskell has a Prometheus client library. It worked out of the box the first time so there is no monitoring “penalty” for using Haskell in this project. I also found and used a nice
mtl
-compatible logging library. - Since the SQL aspects of this project were fairly minimal I used a correspondingly simple PostgreSQL library. This could not have been easier to use and worked the first time I tried it. Also, even though I have no immediate plans to support other backends, my team has discussed putting this type of data onto a Kafka or Kinesis topic in the future. So I took the effort to abstract away the output layer using existential types. I’d never known how to implement dynamic dispatch in Haskell so this was a great experience even if we never end up really needing it.
- The Salesforce bulk API returns object records serialized as a single JSON array within a single HTTP response (the bulk API supports “primary key chunking” allowing for multiple HTTP responses but this is not supported for any of the object types we need to replicate). We can potentially be exposed to a lot of data here, and ideally we’d like to incrementally parse that array as it streams in. This way we can deal with one record at a time and don’t have to provision a larger-than-necessary host machine or depend on swap. Most languages’ JSON libraries don’t support incremental (i.e. “SAX-style”) JSON parsing and would try to fit the whole decoded array into memory at once. But parsers are a killer use case for Haskell. I found a Haskell library supporting incremental JSON parsing and it turned out to be straightforward to wire this up to the streaming HTTP response. The types felt like they snapped together like LEGO blocks.
- During exploration of the Salesforce APIs, I also found that the bulk API was returning records that had timestamp fields serialized as integers, e.g. “1486397149000”. The normal REST API was returning ISO8601-formatted timestamps for the same records. I wanted the timestamp representations to be uniform before writing them to our database so that higher layer code isn’t exposed to Salesforce API oddities. Figuring that the integers were in POSIX time, seconds since the epoch, I converted them. Wrong! Interpreted as a POSIX time, that timestamp is “49072–01–19T16:56:40Z”. As it turns out, those integer values are POSIX time in milliseconds, not seconds (note the useless trailing zeros). To make things worse, I needed to selectively scale/convert selected JSON object properties. Conversion has to be object schema-driven because you cannot reliably detect which properties are timestamps by value inspection alone. Of course this also needs to happen in the midst of streaming that HTTP response and incremental JSON parsing. Lenses turned out to be a perfect fit here. A single conduit connects the HTTP response to the incremental JSON decoder to the timestamp converter to the output database.
These last two problems in particular made me glad that I’d chosen Haskell for this project. I’m sure they could have been solved in Go, but the elegance and concision of the Haskell-based solution is impressive.
Outcome
The code has been running in production for about a month now, replicating data from Salesforce to PostgreSQL, and we haven’t had to touch it once. This kind of operational reliability is typical of what we expect and enjoy with our Go-based projects, so I consider that a win.
I did not keep a journal of my time spent on the project, but from my commit history I conservatively estimate that I spent about three person-weeks actually coding (I was the sole developer). That time included some exploration and problem solving in the Salesforce APIs, which of course had nothing to do with Haskell. I have no reliable way of estimating how long the project would have taken if I’d used Go instead. My gut feeling is that it would have taken about half the time but I have no way of validating this.
Why so much longer? As an experienced Go developer I don’t need to spend much time thinking about how to solve problems in that language, and the opposite was clearly true in Haskell. I don’t use Haskell day-to-day and natural solutions aren’t on the forefront of my mind. To be fair, I haven’t looked for an incremental JSON parser in Go, and if one doesn’t exist that would certainly have evened things up a bit.
I’d also like to acknowledge that absolutely key to my success was access to a local community of Haskell enthusiasts. I live in the Research Triangle Park area of NC where the Haskell community is small. However small, this group of people gets together regularly and they were vital in terms of their technical knowledge as well as for staying motivated.
I got through it, and, of course, I learned a great deal. If I had it to do over again, or if I had to solve a similar problem I’m confident that development would much less time. This is a Haskell theme that I’ve heard before and I’m looking forward to putting it to the test.
Unfortunately I cannot open-source the code from this project, but I can write about it. I have a few posts in mind, but if there’s something specific you’d like me to unpack, please let me know!