My “Haskell In Production” Story

Unlocking Business Data

Salesforce is the business system of record for much of my company’s data. That makes it an important system but unfortunately not an easy one for our own SaaS platform to interface with day-to-day. Fortunately, most of the time our backend developers just need to search or read the values of a few Salesforce objects. Bi-directional integrations, those that require reading from and writing to Salesforce, are rare. Even still, untangling the Salesforce APIs, dealing with authorization issues, and handling integration errors can feel like a huge burden when all you want is a little bit of data.

Why Haskell?

Golang is our de facto standard backend language. I made this choice a couple of years ago. We standardized on Go before starting our SaaS development. All of our backend services are written in Go and it is a popular choice in other areas, e.g. for CLI tools and utilities. Go would normally be the automatic choice for a problem like this. As much as I’m not a fan of the actual language, I admit that it is easy to be productive in the Go ecosystem and when thinking about this problem I felt I could see the solution in my mind’s eye. It would be no big deal to write and maybe kind of boring.

Yes! Haskell Can Do That!

During development I was stretched well out of my Haskell comfort zone, but I never regretted the choice. Not surprisingly, IO- and error-related issues were the biggest challenges.

  • We’d also like client authorization to just work. When we don’t yet have a security token, OAuth2 must be used to obtain one. Like any web request, OAuth2 requests can fail due to transient issues. If we already have a security token but it has expired, then we only know this because we receive a 403 response from another API endpoint (sadly, Salesforce’s implementation of OAuth2 does not return the RFC-recommended expires_in property), so we need to obtain a new one and then retry the original request. This scenario is transparently handled within the automatic retry strategy. The resulting code is concise and easy to reason about.
  • Salesforce’s implementation is unique in that the OAuth2 token responses give you the base URL that you must use for subsequent API requests. This can and does change over time as Salesforce migrates your instance around. Consider that a long-running process may start with an initial security token that directs it to instance A, uses that for a while, gets a 403 response, obtains a new token that directs it to use instance B, and then completes the retry there. But that’s a totally different URL in the retry. Again, this is handled within the retry strategy and is hidden from client code.
  • When using the Salesforce REST API we may receive a set of incremental object updates in one API response. If one of these objects contains an error that “should not occur”, we’d like to be able to discard the whole set at once and then retry processing the set again later. The Salesforce REST API facilitates this. Similarly, if we fail to write the entire set to our SQL database, we’d like an assurance that no updates from the set are lost during reprocessing. This is a rare occurrence and is difficult to test. By accounting for errors at the type level (running a conduit over a monad transformer stack with EitherT) we have a way to reason about how errors are handled whenever they do occur.
  • Speaking of errors, we use Prometheus to collect operational metrics. I was happy to find that Haskell has a Prometheus client library. It worked out of the box the first time so there is no monitoring “penalty” for using Haskell in this project. I also found and used a nice mtl-compatible logging library.
  • Since the SQL aspects of this project were fairly minimal I used a correspondingly simple PostgreSQL library. This could not have been easier to use and worked the first time I tried it. Also, even though I have no immediate plans to support other backends, my team has discussed putting this type of data onto a Kafka or Kinesis topic in the future. So I took the effort to abstract away the output layer using existential types. I’d never known how to implement dynamic dispatch in Haskell so this was a great experience even if we never end up really needing it.
  • The Salesforce bulk API returns object records serialized as a single JSON array within a single HTTP response (the bulk API supports “primary key chunking” allowing for multiple HTTP responses but this is not supported for any of the object types we need to replicate). We can potentially be exposed to a lot of data here, and ideally we’d like to incrementally parse that array as it streams in. This way we can deal with one record at a time and don’t have to provision a larger-than-necessary host machine or depend on swap. Most languages’ JSON libraries don’t support incremental (i.e. “SAX-style”) JSON parsing and would try to fit the whole decoded array into memory at once. But parsers are a killer use case for Haskell. I found a Haskell library supporting incremental JSON parsing and it turned out to be straightforward to wire this up to the streaming HTTP response. The types felt like they snapped together like LEGO blocks.
  • During exploration of the Salesforce APIs, I also found that the bulk API was returning records that had timestamp fields serialized as integers, e.g. “1486397149000”. The normal REST API was returning ISO8601-formatted timestamps for the same records. I wanted the timestamp representations to be uniform before writing them to our database so that higher layer code isn’t exposed to Salesforce API oddities. Figuring that the integers were in POSIX time, seconds since the epoch, I converted them. Wrong! Interpreted as a POSIX time, that timestamp is “49072–01–19T16:56:40Z”. As it turns out, those integer values are POSIX time in milliseconds, not seconds (note the useless trailing zeros). To make things worse, I needed to selectively scale/convert selected JSON object properties. Conversion has to be object schema-driven because you cannot reliably detect which properties are timestamps by value inspection alone. Of course this also needs to happen in the midst of streaming that HTTP response and incremental JSON parsing. Lenses turned out to be a perfect fit here. A single conduit connects the HTTP response to the incremental JSON decoder to the timestamp converter to the output database.

Outcome

The code has been running in production for about a month now, replicating data from Salesforce to PostgreSQL, and we haven’t had to touch it once. This kind of operational reliability is typical of what we expect and enjoy with our Go-based projects, so I consider that a win.

--

--

--

Reformed Executive, Software Architect, and Product Person

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How to use AWS EFS Service with 2 instence....

It’s Time For A Reality Check! Integrating EDI To Shopify

Old Game, New Hardware, and The Immense Headache

CSS Media Queries On A High Level

Student Spotlight: This Air Force Vet Wants to Use His Coding Skills to Solve Social Services…

152. Maximum Product Subarray

6 EMERGING OFFSHORE SOFTWARE DEVELOPMENT TRENDS

Most Useful Italian Phrases

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
David Joyner

David Joyner

Reformed Executive, Software Architect, and Product Person

More from Medium

The truth is that most software sucks.

PostgreSQL and typeorm — A glossary for database administration

Microservices all the way, or not?