Railway oriented programming, clojure and exception handling: why and how?

Published in

AppsFlyer Engineering

15 min readSep 1, 2020

AppsFlyer has a broad functionality offering many products. For us as software developers it means writing complicated business flows on a daily basis. Complicated flows can mean basically anything: business logic containing many I/O operations, complex data transformations pipelines, etc. Anything complicated in code already contains potential exceptions, errors and bugs by definition.

What is also important, we at AppsFlyer use Clojure, a functional language, as the primary backend language for production.

This raises the question of how to tackle exception handling in a functional manner, which should be both human readable and functionally elegant. This challenge can be solved with a beautiful programming principle — Railway Oriented Programming (ROP), broadly used in other functional languages, but not really common in Clojure. In this blog post I would like to introduce the basics of ROP while hoping to increase the popularity of this powerful yet elegant tool. To discover more advanced features of ROP, please check out Scott Wlaschin’s amazing talk about ROP and F#, which this post is inspired by.

Happy path

Let’s say we need to develop a user sign up web page, which will be pretty similar to AppsFlyer’s one. The page contains a form to collect a user’s details. In this article, we’ll focus on the backend part of the flow:

The clojure code reflecting the flow is pretty straightforward:

(defn create-user
  "Gets data from UI, decodes, validates, enriches the data,
  saves it to the DB, sends verification email to the user and returns response"
  [request]
  (-> request
      decode-request
      validate-user-input
      enrich-input
      update-DB
      send-email
      build-response-with-status))

Thanks to clojure’s threading macro the code looks so good. What could possibly go wrong?

Sad path

Basically everything:

So instead of our elegant piped code at the beginning we are getting a monster like this:

(defn create-user
  "Gets data from UI, decodes, validates, enriches the data,
    saves it to the DB, sends verification email to the user and returns response"
  [request]
  (try
    (let [data (decode-data request)
          validation-result (validate-user-input data)]
          (if (:valid? validation-result)
            (let [enriched-data (enrich-input data) ; can't escape this let: bindings used in several places
                  DB-response (update-DB enriched-data)]
              ; type of response and sending email depend on DB response status
              (if (= (:status DB-response) 201) 
              ; the response depends on sending email
                (build-response-with-status enriched-data 
                                            (send-email data)) 
                {:success false
                :error   (str "Failed creating user: "
                            (-> (json/parse-string DB-response true)
                              :body
                              :error))}))
              validation-result))
        (catch Exception e
            (log/error "Failed creating user, exception: " e))))

The problem

Basically the code works. Sometimes if the code does its job, it’s good enough, and it’s the developer’s right to decide to leave it this way. But it hurts to see this kind of code when writing in functional language: keeping state with lets (as we need the declared bindings in several places), a lot of ifs to check the state of the operations and select the “path” of “success”/”failure”, try-catch with its catch’s performance penalties.

What is more, the code is not readable at all: it’s hard to understand the order of the operations, to detect which “else” line corresponds to which if, and what the overall logic is. The last, but the most important point is that if-else, let as state keeper and try-catch are very imperative by their nature.

Sometimes it’s difficult to stay away from imperative approach, but we should remember that we have a choice: the functional paradigm offers us tooling for error handling and flow control.

A way out

What we usually can see in clojure code is splitting the whole flow into 2 things: small “worker” functions which actually do the job and an orchestration function that unites them altogether and is responsible for flow control and error handling.

The orchestration function calls the “workers” in a required order. The workers are piped with the threading macro again, so each of them should return the input the next one expects. Each “worker” throws an exception if something goes wrong inside of it. The orchestration function has a central “catch” point, where all the exceptions are handled:

(defn create-user
  "Gets data from UI, decodes, validates, enriches the data,
  saves it to the DB, sends verification email to the user and returns response"
  [request]
  (try
      (-> request
          decode-request
          validate-user-input
          enrich-input
          update-DB
          send-email
          build-response-with-status)
      (catch Exception e
          (log/error "Failed creating user"
                     {:error   (.getMessage e)
                      :request (json/generate-string request)}))))

An example of “worker” function:

(defn- update-DB
  "Sends request to DB to create user from the enriched data. Returns input or throws exception"
  [input]
  (let [{:keys [success message]} 
          (-> (get-http-response-from-url-with-retries :post
                                                    (:DB-api config)
                                                    body)
              :body
              (json/decode true))]
     (if success
       input ;returning input to pass it to the next function
       throw (Exception. (str "Failed updating DB: " message))))))

The function is pretty straightforward: it tries sending the user’s details which are passed as an input to the DB. If it succeeds, it returns the input (to pass it to the next function). Otherwise it throws an exception with the details of the event occurred. This trick also helps to bypass the following functions calls, once the error occurs.

So everything looks good, understandable, functional, elegant and readable, right? Right, but… there is always this “but”, and in our case it’s performance. The problem is in using catch a lot: it is expensive, so if the code is throwing a lot of exceptions and using the try-catch for flow control, then performance is an issue.

Basically, the Clojure community has no consensus about try-catch: some people think it’s essential to use it since clojure itself is written with this method, others consider it non-functional. It is a matter of a personal choice, but for us performance is important.

What are we left to do then? Where is the way?

A new hope

So how can we keep the code containing a lot of operations and conditions clear and functional on one hand, and performance-optimised on the other? Railway oriented programming is the answer! But what do railways have to do with programming? Let’s see!

Let’s imagine a function that expects one input parameter, e.g., a validator expecting some data and returning true or false, indicating if the input is valid or not. In other words, the function returns either true or false:

In the functional programming world it is known as Either monad and has analogies in many programming languages.

Depending on the result of validation, we usually decide what to do next: we continue our flow if the input is valid or we log it and quit if the input is invalid (as we have nothing to do with invalid input). So we have one input (track) and two options of output track — continue or quit:

So here are the tracks, the railway tracks! Indeed, we can imagine any of our operations as a dual track function:

It’s easy to see that every function in this example has one input track (expects one parameter) and has two outcome tracks: success track/happy path and failure track/sad path. The reasonable question here is: what is the sense of all those tracks if we cannot connect them into one flow? The answer is simple: the sense is that we actually can. How? Let’s see together!

Let’s imagine our happy path with no disruptions, taking just three of our operations to simplify the explanation:

We get the request, we validate it. The validation passes, and we pass the request data to the DB update function. Let’s say, everything goes smoothly, and we manage to update the DB and get the 200 response. Then we just send the email, which also succeeds, and return the response. The track is straight, passing from one “station” to another with no problem.

Once it comes to the sad path, we know that every step can fail, meaning that every “station” has a choice of either success or failure track:

Again we start with the request, passing it to the validator. Let’s say, the request is valid. Then we proceed to the following “station” and get to the DB update stage. Let’s say the DB is not available. What are we to do? We do not want to proceed to sending the verification email to the user, as the user was not actually created — there is nothing to verify. The following stages also will not be relevant in this case.

What we actually need to do is to choose a failure track: to log the error and quit the whole flow, bypassing the following steps. This is relevant for every “station” — wherever we fail (in the validator, DB updating, etc.), and every of them has its own failure track. But basically what we would like to do in all of those failure tracks is the same: to log error and quit the whole flow.

how to unite all the failures into one track

So it is reasonable to compose all of the tracks into one unite failure track:

Thus, we now can connect our “stations” into one duo-track:

Eventually, all of our functions can be connected together in this manner. Note that we get just one request with input data, which is represented by one success track. After the last “station” of the rail, we can return either two things as two tracks or one response (just wrapping the result with the regular function which returns one thing, e.g. a map for HTTP response).

Okay, so now we understand the rails, stations, connecting them and choosing our paths. Looks good in the pictures, but how is it all related to the code? The implementation is as interesting and easy, as the theory — stay tuned!

Implementation

First of all, let’s think of the building blocks — our functions which actually do the job and have to decide which track to choose. How can the function decide? Short answer: “with if”. Long answer: the function does not know if the operation will succeed or fail before it actually performs it. Once it’s performed, it is too late to decide though. So the next function should know the result of the previous one to decide, whether to perform the current operation or not: if the previous operation has failed, there is no sense to perform the current operation, so the function needs to choose the failure path. In contrast, if the previous function has succeeded, nothing prevents us from executing the current functionality.

Example: we start with the request, passing it to the validator. There was no previous action, so we are still on a success track, meaning that we should perform the validation. So the function validates the input and has to return some simple indication, if it has succeeded or failed, for the next function (DB updating) to be able to decide whether it should send the data to the DB or not.

We can implement this logic in every single function, but we don’t want boilerplate code doing the same for every “worker” function inside our flow. The solution is to create a wrapper function which will do it for every single “worker”:

(defn apply-or-error [f [val err]]
  (if err
    [val err]
    (f val)))

This function gets two parameters: f, which is the next function in our flow, and a vector of val and err, which is the response of our current function. val is the indication of success instructing that the next function should also be executed and passing input value for this function. err is the indication of failure of the current function, instructing that we should take a failure path and preventing the following function’s execution. It is also a description of the error occurred to log and return it in the resulting response if needed. Of course, if the function succeeded, err is nil.

Now it’s obvious what this function does: it is executed before the current “worker” function, and it checks if the previous “worker” has failed or not. In case of success, it calls the following “worker” function and passes the output value of the previous function — the val — as an input parameter. If the error is not nil, it returns a vector of the val and err.

You might ask, why return val in case of a sad path. Good question! We do not have to, as in case of failure the next function won’t be executed anyway, so the form of indication of success and failure does not matter anymore. But you still might want to log the val. Why? Imagine, you got a request to add a new user to the DB. Let’s say, the validation has passed, but the DB update has failed. The error that we get from the DB will be something like “Failed updating DB: no connection” or similar. It’s hard to understand from it, which request for which exact user has failed. But once you log the val, which contains the data you send to DB (at least, some user-id for sure), you are able to find the exact reason of failure for the exact user in your logging system.

Also it’s worth mentioning, that vector is used just to simplify the example. It brings additional complexity to the code, making us remember, which place is for success and failure indication (place oriented programming). It’s more convenient to use hash-maps instead: {:success true :error nil}.

Back to business! So now we have some function, which will decide for us which track to take. How do we use it? We need the decision on every step, so the function should be applied to every “worker” function we have, starting with adding the request and nil as a vector indicating that there was no failure (sure — nothing happened yet):

(clojure.core/->> [request nil]
                 (apply-or-error decode-request)
                 (apply-or-error validate-user-input)
                 (apply-or-error enrich-input)
                 (apply-or-error update-DB)
                 (apply-or-error send-email)
                 (apply-or-error build-response-with-status))

It does not make any sense to duplicate for every single function. Here comes power of clojure macros:

clojureman flying to help with macros power

Don’t be afraid of this terrifying word, it is pretty simple:

(defmacro =>>
  "Threading macro, will execute a given set of functions one by one.
  If one of the functions fails, it will skip the rest of the functions and will return a fail cause."
  [val & funcs]
  (let [steps (for [f funcs] `(apply-or-error ~f))]
    `(->> [~val nil]
           ~@steps)))

The macro gets initial val and all the workers funcs. It is pretty much the same to -> that is used inside of it, the only difference is that our macro applies apply-or-error to every single input function. So we don’t need to apply it manually, our macro does the job (as you can see in the previous code snippet which was actually macroexpand of this macro and our “workers”).

Looks good, huh? Yes, but not 100% yet! We have piped the functions together, we have a “switchman” to decide which path to take, what are we missing? How does apply-or-error decide which path to take? Right, it has the output of the previous function. It means that all the functions should return unified output that will be clear to the switchman. This is the last building block — a wrapper unifying the “workers” output.

Indeed, our regular validator function can return basically anything at any form, from simple boolean to hash-maps: {:valid? bool :error string :invalidity-code int}. It does not look like our standardised railway path indication [val error]. To unify the output, we can wrap the function’s regular output inside the function itself:

(defn validate-user-input
  "Railway oriented programming (ROP) wrapper. Returns both validation result
  and input parameters for the next step, and in case of
  success: [result nil]
  failure: [result error]"
  [input]
    (let [validation-result (create-user-input-valid? input)] ;returns `{:valid? bool :error string :invalidity-code int}`
      (if (:valid? validation-result)
        [input nil]
        [input validation-result])))

Looks like we are good to go finally! Let’s put it all together:

(defn create-account-handler
  [data]
  (try
    (let [[result error] (=>> request
                              decode-request
                              validate-user-input
                              enrich-input
                              update-DB
                              send-email
                              build-response-with-status)]
      (if error
        (do
          (log/error "Failed creating new user"
                     {:error error :input data})
          error)
        result)))
   (catch Exception e
       (log/error "Failed creating new user" {:error e :input data})))

Just look how beautiful our code is! First of all, it’s readable, it’s so easy to see the operations order in the flow. Then, it’s clear what we do with the exceptions returned. It still has a let and an if, but no nested structures — much more beautiful and functional!

i don’t always have errors, but when I have I use ROP

The only question left here is why still use try-catch if we were aiming so heavily to get rid of it?! Well, we indeed managed to eliminate it in everything that depends on us. But there are always things like unexpected errors or third-party libraries exceptions that you cannot handle any other way, because they are designed to be thrown, and you just cannot change it. It is always possible to get them, and we don’t want to crash everything because of it, we want to log them gracefully and go on — this is why try-catchis still here, but hopefully won’t be used.

What now?

Now you turn to implement ROP! How do you decide whether ROP fits your situation? Easily, there are several signs:

You have flow with a lot of sequential operations;
You have nested ifs and lets, and you don’t like it;
You have many try-catch, and you care about performance;
You used threading macros, but not sure what if something fails;
You would like to bypass some of the functions is some scenarios;
You just would like to functionize your code and try ROP.

How do you start coding it? First of all, you should read before you write. I recommend you to try the “Further reading” section before you add ROP into your production code. The first article is pretty basic, the second and the third are by legendary Scott Wlaschin and explore ROP in detail. If you’d like to optimise your ROP code with monads, consider the links number 4, 5, 6.

Basically you can take code examples from here or any other article or invent something new. You can use clojure built-in macros like ->>, ->>, some->, some->> and others.

But I highly recommend to take a look into the ready-made ROP libraries (see the list at the end), as the less code you write on your own, the less tests you need to cover it, the less bugs you create. Good luck!

Conclusion

Railway oriented programming is a principle which can be used in a variety of ways. It is also flexible in implementation as function composition is essential for clojure. What is more, ROP way of validation and flow control can help us minimize the need of excessive error handling. It also offers a unique opportunity to bypass part of the functions in the flow if needed. ROP is power!