Building web applications for the cloud
Asynchronous event-driven web applications
The cloud means different things to different people, but we all agree that it for sure isn’t like your own data center. You don’t control the hardware or virtualization layer, you mostly don’t control the operating system, the database, caching infrastructure, or even the language runtime that is going to be used. Many of the assumptions that you make when writing an application to run in your own data center, need to be discarded when running on the cloud. You can get rid of many of the complexities, but you’ll get new challenges in exchange.
One of the challenges is how to actually build the application. And how you build your application will have a huge impact on how you’ll have to manage it. The decisions you make for a specific programming language, web framework, and persistence model, will impact how easy it is to scale and operate your application on a cloud platform.
Python and Java have a wide selection of very good web frameworks, suiting almost anyone. Ruby has mostly one popular framework, Ruby on Rails. Web requests are processed on distinct threads or even their own processes.
This request processing model makes development easy, because unless you introduce variables that you share between multiple threads, you can develop as if you were the only thread currently executing and don’t need to bother about anything else.
When you look at the request behavior of today’s multi-tiered web applications, or even an application using Microservices, you’ll find that your web application is mostly waiting. For your distributed cache, your user profile service, your database lookups, or whatever else your application needs to do. Not much of the actual processing is really performed on the web application thread.
As a result you’re going to see many threads that are waiting, instead of executing, and you’ll need many threads to utilize the processor cores in your server. And that is as long as things go right. If something goes slightly wrong, such as your credit card backend suddenly taking two minutes to respond instead of several milliseconds, you’re very quickly going to use up all your available threads and your web application will look totally hung from a user’s perspective, while your processor utilization will go to zero. Not the best scenario, and it gets worse as you scale your application, because the potential for hang situations and their effects will only get worse with increasing load.
As developers we tend to use the language or framework we’re most familiar with or that feels the most natural for us at the time, but this mindset is incredibly limiting. We should rather be interested in which tools are most appropriate for a given job and get familiar with more technologies to broaden our skills and solve diverse problems.
What if you could use the idle time on those threads to actually serve requests? You could use asynchronous requests to your backend and while the backend is working on the request you already process the next one. There are specialized frameworks such as Vert.x, Event Machine, and Twisted that can be used to do this, but unless you want to use those directly to build your application, you’re largely on your own because there is no support in the popular web frameworks and most modules/packages don’t support the asynchronous event-driven behavior and will block your execution thread.
NodeJS on the other hand was architected as an asynchronous event-driven framework from the beginning. It only runs a single thread inside of a process, but processes multiple requests on that thread, and you can run as many processes as you need. If a backend experiences a slowdown you’ll get your callbacks later and have no additional processing overhead or hang situations. Only user requests that require that specific slow backend will experience a slowdown.
What makes asynchronous event-driven web applications even nicer to operate is that cloud platforms can scale-out your cluster automatically when they notice high CPU utilization on your application.
Naturally, this only works when your processor utilization is actually high and corresponds to your actual load and the health of your application. If all your threads are hanging because a backend is responding slowly and you used blocking calls, this sort of auto-scaling will not work.