Asynchronous JS/Node Fundamentals
Intro
Over the last few weeks, I have spent many hours reading and writing about Node’s asynchronous, event based model. During that time I have come to define, through my own research, a thorough understanding of the inner workings and practices of asynchronous programming in Node.js.
This summary is meant to serve as a deep dive into the world of asynchronous Javascript on the Node platform, through my and others meticulous note taking.
This post will serve you well if you want to write asynchronous code using the “callback pattern”, try “promises”, and examine the best of both worlds, then you’re in the right place.
I hope to guide you through all of those subjects, while linking each one to another in order to tell the story of the collaboration of all these pieces.
Enjoy.
Synchronous vs. Asynchronous
In order to do you justice, let’s start off by defining the term “synchronous” and “asynchronous”. Well when programming you may have heard the expressions used to define a type of execution. Meaning that when a program executes “synchronously”, it will run all the program’s operations in order, waiting for each operation to finish before moving on to the next.
With “asynchronous” however, it means that the program doesn’t need to wait for that particular operation to finish before moving on to the next task. The program would initialize the asynchronous operation and then move on.
Here’s some pseudo code so you can understand the big picture.
Above we made a couple of greeter methods, greeter_sync and greeter_async. Their behavior should be to greet the world when they’re called, then roughly 5 seconds later say goodbye.
greeter_sync is implemented just as we said. It begins by printing “Hello World”, waits around 5 seconds, then prints “Goodbye World”.
While greeter_async seems to be implemented in reverse.
The “Goodbye World” statement is at the top of the program, while the “Hello World” statement is at the bottom.
You could almost expect these functions to behave as direct opposites of each other. However take notice of the special syntax we added. We’ve used both the async keyword, along with a then keyword.
The semantics of both in this examples are as follows:
- async will tell the program to run the sleep method in the background. When it’s done, it will call the code right of the then keyword.
- The then keyword tells the program to print “Goodbye World” when its done sleeping.
If this isn’t making much sense then think of it this way.
The async keyword will tell the program to not wait for the sleep method to finish, the program then resumes and prints “Hello World”.
The then keyword says what to do when the sleep method is finished, which is too print “Goodbye World”
As you may have guessed, greeter_sync and greeter_async will both print:
Okay we know have a foundation on how asynchronous code will look and work. Let’s take this abstract model of async code and make it more concrete by showing some examples in Javascript.
Async Operations in Node
While the browsers async operations would be XHR/AJAX or DOM events, Node’s concepts of async operations is IO. IO being:
- File System — Reading/Writing Files,
- Network — HTTP requests
- Etc
Now the way Node allows us to do these types of operations is with their own bindings to the system, along with some special sauce in between.
Node then wraps these binding with Javascript functions and those functions become the base for anything else to be built on top of that.
Here’s an example of what on of those functions signature could look like.
Wait a minute… Something isn’t right here.
You’ll notice we asked the file system to read a file, and to assign the value back to the file variable. That doesn’t make sense. Well at least not in Node.
If all IO is async, then we were suppose to tell readFile what to do when it was done, just like in our async example above.
Oh wait I know what we forgot, we didn’t give it a function.
Let’s fix that really quick… And there we go! Take a look.
You see this time we passed in a function as the last argument of the readFile function. This function is commonly referred to as a callback, and callbacks are used a lot in Node. Under the hood that function gets called when the file system operation is done. When the function argument is called, our program will log “done”.
This introduction to callbacks is a very important and powerful understanding in Node. Though the funny thing is that callbacks aren’t really a new concept in Javascript. Even the Array map function takes a callback so what’s the main difference between Node’s callbacks and other callbacks.
Well let’s first point out that in this context there are two kinds of Callbacks: composing functions / iterators and a continuations.
Array’s map function takes in an iterator callback, and that gets called for every item in a collection. Where as our readFile callback is suppose to be called only once, making it a continuation.
The term continuation comes from the idea that we start a process, wait, and when it’s done it will call a continuation function to continue the process. This is even further defined in a pattern named Continuation Passing Style. This is the style that Node uses by default, and is the base technology that you’ll use when working with asynchronous operations.
Though Continuation Passing Style doesn’t have standard in place for how to use it exactly, aside from you need pass a continuations. For that reason Node has made their own standard on top of CPS.
The standard is as follows:
- Any function that performs an asynchronous action can be referred to as a an Actor function.
All Actor functions should take a callback function or continuation as the last argument of the function. - The callback or continuation function must accept an error value as its first argument always.
If an error occurred in an Actor function, then the users callback will be called with that value. Everything after the error argument can be defined by the author of the Actor function
Make sense? Good.
How about we show an example of what our code would look like following the standard.
As you can see the Actor function takes in a callback as the last argument. Then the callback function takes in an error value and result value.
You can even see how we’re handling the errors. We start by checking if we were passed in an error, if so we then log the error. That’s a fairly simplistic example of error handling, in most cases you’ll be passing errors to other callbacks. Let’s see an example of how that would be done.
In this example we wrap the readFile function with getFile.
getFile let’s us wrap the data with readFile with our own data.
More importantly notice how we handle errors in getFile.
First we call readFile, if that returns an error we pass it along to getFile’s callback. If it didn’t return an error, wrap the returning data in an object, then call getFile’s callback.
This style of error handling can be considered manual error propagation.
Manual error propagation is necessary with our callback code.
This is because Node’s callback pattern doesn’t allow for callbacks to throw errors. Yup that’s right, no error throwing with asynchronous code.
Want an example?
That doesn’t work.
This is due to the way Node will call our callbacks.
The easiest way to explain it is that the callback isn’t executed in the same context as try-catch block. So if you throw an error inside a callback, you’ll cause an uncaught exception error and crash the whole process.
Fortunately Actor functions can be designed to handle this, but it’s not common for them to be done that way. Which means you can’t throw errors still since you don’t know if it’s going to be handled correctly. Sadness…
This where we lead into some of the caveats of the CPS.
- Actors don’t return values.
They take in callbacks/continuations and pass the values to them.
This is an important difference when comparing CPS code to our typical functions that usually return values. - Continuations shouldn’t throw errors.
I don’t know about other languages using CPS, but in the case of Node error-handling is not implemented with the try-catch on a per Actor function level. So to be safe you don’t throw.
These are considered “hard guidelines”, or just rules…
But with these rules arise interesting problems with the Callback Pattern.
The first being how do you do async operations in sequence.
Sequential Operations
Okay in this example you’ll see me define a sequential, asynchronous operation. And by sequential I mean that this example with have multiple async operations that rely on each other to finish before they can begin.
Say we have an application that has Users, Posts, and Likes for Posts.
In our application we’re going to make a function that retrieves the total number of likes for all of a given user’s posts. To do so we have a serveral helper functions to request the data. One will find the user with a username, the second will find posts by a given user ID, and the last will get Likes for each post. We then will use the function we’re defining to calculate the total.
For the sake of the example we don’t define findPostsByUserId, findCommentsByPostId, and findCommentsByPostId.
We should assume those to be in scope and asynchronous operations.
To the point though, we nest the callback functions.
That’s the result of needing to do a sequential ordering of multiple async things. Take note how we handle errors all the way down, and that these callbacks will continue to push to the right if we needed to do more operations. This often referred to as Callback Hell or Pyramid of Doom.
Not so pretty… And beyond looks it’s not DRY.
Now know some of you are wanting a solution to this problem, but for those of you who aren’t convinced that the pattern shown above has issues I have another example.
Parallel Operations
The Parallel example pairs well with the Sequential example, mostly because it shows how it’s tricky to do many asynchronous things at once.
A more direct way of explaining that would be to say that we have three asynchronous operations that need to happen, and then when they’re all finished a callback will be called. Here’s an example.
Above you can see we have the method getTweetsPostsRepos, which is responsible for fetching the tweets, posts, and repos of a given username.
In our new function we make the three separate calls at the same time, in order to execute in parallel. But how we synchronize when all three are done? We shouldn’t call getTweetsPostsRepos callback more than once, since then it wouldn’t act like a continuation, so what do we do?
The answer I gave was to create a counter, and a function that add to the counter. When I decide the counter is finished the original callback that was given to getTweetsPostsRepos will be called.
The problem with this example is that it requires a very ad hoc counter to this parallel operation. We also have to decide more on the semantics of error handling and passing of results. Notice that we have to decide on whether to collect errors or not, how to store and return results.
Tiny problems in the grand scheme of things, but problems none the less.
It’s at this point that I hope I’ve shown that callbacks alone cannot completely solve one’s problems, and that it requires better abstraction to handle these problems and caveats. I know can show you two alternatives that center themselves around solving these problems.
Async.js - Better Callbacks
The first alternative I’ll show you is a library that goes by the name Async.js.
Now some of you may be familiar with Async.js, and some not, but nonetheless this not a sponsorship of the library. I’m only using Async.js as a prime example of library that tries to solve the problems above, with callbacks.
Sequential Operations
Let’s take a look at our original sequential example re-write them with Async.js.
Shockingly different than before right?
With Async.js and its `waterfall` method we have successfully collapsed our pyramid of doom. Instead of nesting we opt for using an array of functions that will be called in order, as each executes successfully they pass their results value onto the next function. Keep in mind that this is still all done with callbacks, so don’t throw errors or rely on return values of the callbacks.
Parallel Operations
Now let’s look at the parallel operations rewritten with Async.js.
Gee Willickers! This one has also been nicely simplified.
We know just use Asyncs’ parallel function and it handles the rest.
It even has the same semantics I spoke about above, where if anyone of those functions error it will immediately call the main callback. If thery were to all succeed, then the last callback receives an array of results.
At this point you may feel, “Cool. This solves my problems, what could be better than this? Or at least do we need more?”. Good questions, honestly this is where many developers start to branch off in opinion.
There will be arguments against Async.js and there will be spouts about how the code is organized, or that callbacks/callback wrapping libraries don’t solve all problems well. Well let me take the opportunity of introducing a different abstraction that will help us solve these problems. Promises.
Promises
What are Promises?
Promises are constructs that act as proxies to the return values of a potentially unfinished computation. If that seems to vague, let me explain it differently. Promises are values that can return from asynchronous functions, when used Promises will encapsulate the potential value that the asynchronous function will return.
Promises are Robust
With this base understanding of how promises work we can already see the true potential promises. But what are some of the key features that promises bring to the table?
- Promises are Values
- Promises can throw errors
- Promise are always Asynchronous
If this is making much sense let me use a simple example to explain the way you’ll use Promises compared to callbacks.
Above we use one of the first examples with callbacks, but re-written with promises. First we call the readFile method with our path, but this time we omit the callback. readFile in this example returns a promise, and with that promise we begin handling two scenarios. Success and Failure.
Promises all together have three important states:
- Pending
- Fulfilled
- Rejected
A promise is initialize in the Pending state, it then can be resolved or rejected by the author. Once resolved or rejected, the same promise cannot be resolved or rejected again. What typically happens is that an asynchronous function returns a pending Promise, with that Promise you start programming around the success/fulfilled or failure/rejected states.
This is what is shown in the above example.
We first invoke an asynchronous function that return a promise,
then we tell the promise what to do if its fulfilled or rejected.
The way we’re communicating to the promise is through the `then` method of the promise. This method takes in two arguments, a success function and a failure function. Either these function will be used when the aynchronous operation finall finishes and then fulfills or rejects the promise.
If the promise is fulfilled it calls the first function will the possible results from the asynchronous function.
If the promise is rejected it will call the second function and pass in the error value it was rejected with.
Again you can see the code portrays just that.
Whats so special?
So you might be wondering, “So are Promises just used as a way to aggregate our callbacks?”. The answer to that question is: They not only do that but more. Let me explain.
Remember the caveats of callbacks that I mentioned a while ago?
How the Actor functions can’t return values and you can’t throw errors, remember that? Well you don’t have that problem with Promises.
Promises are the values that you can return, and they have more advanced error handling built in. How about we explore the benefits of Promises error handling.
Error Handling
With callbacks we couldn’t throw errors because we couldn’t catch them.
Promises on the other hand implement there own try-catch block around the `then` method functions. Here’s an example.
Pretty interesting huh? Above we show how you can add error handling to your Promise code. The `then` method of a Promise will take in a success handler function, and failure handler function. If the Promise were to fail it will only call the failure handler and pass in a failure if given one.
states of promise,
promise only can resolve once,
promises always async,
promises error bubbling,
promises sequence
promises parallel