This post was written by Rob Tweed who is the director of M/Gateway Developments Ltd, a consultancy and software development company in the UK that has focused on web and NoSQL database technologies, particularly in the healthcare industry, since the mid-90s. Cycling, photography and listening to and recording music are what keep him sane away from the keyboard! This post first appeared on Rob’s blog.
Imagine all the benefits of Node.js: one language and technology for both front-end and back-end development, plus its outstanding performance; BUT without the concerns of concurrency and heavy CPU processing, AND with high-level database abstractions: with some interesting parallels to Amazon Web Services’ Lambda, that’s what the QEWD.js framework is designed to deliver
I’ve worked with Node.js since its early days in 2011. I’ve also worked for many years more with conventional server-side languages, so I’m aware of the differences with the Node.js philosophy, and with what I’d like to do versus how Node.js wants/expects me to do it. Additionally, I’ve worked recently with Java developers who have made (or tried to make) the transition to Node.js, which has been an interesting and revealing exercise.
Whereas other, more conventional server-side languages such as Java and Python provide optional syntax to perform asynchronous logic where it makes sense and is more efficient to do so (e.g. to access multiple remote services in parallel), the norm in those languages is to write synchronous logic and even when accessing databases or files. The multi-threaded nature of these languages’ technical architectures means that the developer doesn’t have to be concerned about concurrency. So when developers with a background in languages such as Java or Python are faced with moving to the single process environment of Node.js, it’s unavoidable and mandatory asynchronous logic comes as quite a culture shock.
As a result, numerous articles have been written that recommend the use of Node.js for only certain kinds of application. One such article by Tomislav Capan is pretty typical, suggesting: “Where Node.js really shines is in building fast, scalable network applications, as it’s capable of handling a huge number of simultaneous connections with high throughput, which equates to high scalability“. Like many others, he concludes:
- You definitely don’t want to use Node.js for CPU-intensive operations; in fact, using it for heavy computation will annul nearly all of its advantages
- The [WebSocket-based] chat application is really the sweet-spot example for Node.js: it’s a lightweight, high traffic, data-intensive (but low processing/computation) application that runs across distributed devices
- If you’re receiving a high amount of concurrent data, your database can become a bottleneck. He recommends that data gets queued through some kind of caching or message queuing (MQ) infrastructure (e.g. RabbitMQ, ZeroMQ) and digested by a separate database batch-write process, or computation intensive processing backend services, written in a better performing platform for such tasks
- Don’t use Node.js for server-side web applications with relational databases (use Rails instead)
- Don’t use Node.js for computationally heavy server-side applications
All well and good, but I would like to be able to have my cake and eat it too:
- I’d like to avoid a mash-up of a separate message queue such as RabbitMQ and multiple languages. The less complexity and the fewer moving parts the better from the point of view of maintainability and stability.
- In my experience it’s almost impossible to avoid some amounts of CPU-intensive processing on the server-side of most web applications, so I’d like to be able to handle such processing without fear of grinding a Node.js application to a halt for everyone.
I’m sure I’m not alone in having this wish-list. So, a question I had from my earliest days of using Node.js was: Couldn’t it possible for me to have my cake and eat it, and get all the advantages of Node.js and avoid all the downsides?
What sets Lambda apart from the normal Node.js environment is that your functions are executed in an isolated runtime container where they don’t compete for any other users’ attention, so concurrency isn’t actually an issue. Nevertheless, look at the published example functions and they all use the usual asynchronous logic.
That doesn’t make sense to me. It’s fair enough to use asynchronous logic if it makes sense or is more efficient to do so, such as when you’re making multiple, simultaneous requests to remote S3 or EC2 services. However, for many Lambda functions you’ll maybe making a few accesses to remote resources which, if they could be done truly synchronously, wouldn’t affect performance or cost, but conversely would simplify the logic considerably. Put it this way: no Java, Python or .Net developer that I know of would go out of their way to use asynchronous logic if they didn’t have to, so why should a Node.js developer?
Of course one of the reasons why Node.js Lambda developers continue to use asynchronous logic is that they believe there’s no alternative: pretty much all the standard interfaces for databases and remote HTTP-based services are asynchronous. Until things like Lambda came along, there was no point in having synchronous APIs for Node.js. Hopefully that can and will change. For example, the tcp-netx module, which provides synchronous as well as asynchronous APIs for basic TCP access, ought to provide the underpinnings for a new breed of synchronous APIs for use in a Node.js environment such as Lambda, where concurrency isn’t an issue. Indeed there’s already such an interface available for MongoDB.
Not everyone, of course, will want to move their applications to Amazon’s “serverless” Lambda service. Prevailing wisdom would suggest that it’s not possible for them to “have their cake and eat it too” , but actually that’s not entirely true. Take a look at a Node.js project known as QEWD.js and you’ll see a way to achieve something similar to Lambda’s isolated execution containers, but running on your own servers.
QEWD.js is a server-side platform for REST and browser-based applications, built on top of a module called ewd-qoper8 which implements a Node.js-based message queue. Incoming messages to ewd-qoper8 are queued and dispatched to pre-forked Node.js child processes for processing. However, the key, unique feature is that each child process only handles a single message at a time, so the handler function for that message does not need to be concerned about concurrency: like Lambda, the handler function is executed in an isolated runtime environment. After handling the message and returning the response to the master ewd-qoper8 process, the child process does not shut down, but immediately makes itself available to handle the next available message in the queue. So there are no child process start-up and tear-down costs.
QEWD.js builds on top of ewd-qoper8, integrating its master process as an Express middleware to provide a complete back-end development environment for web applications and REST/Web Services. A pretty good analogy of QEWD.js is a Node.js-based equivalent to Apache & Tomcat. QEWD’s fully asynchronous, non-blocking master process, incorporating Express, socket.io and the ewd-qoper8 message queue is, in many ways, a perfect Node.js networked application: it’s really lightweight, doing little else than ingesting incoming HTTP and WebSocket messages, putting them on a queue and dispatching them to an available child process. It’s therefore capable of handling large amounts of activity. All the “userland” processing happens in the isolated environment of a separate child process. QEWD allows you to configure as many child processes as you wish to meet the demands of your service and to make optimal use of your available CPU cores. If a back-end message handler function uses synchronous logic and blocks the child process, it affects nobody else. If it uses a lot of CPU, then it doesn’t directly affect any other concurrent user, any more so than in, say, a Java or .Net environment. Meanwhile, the master process continues to ingest, queue and dispatch incoming messages unabated.
Therefore with QEWD, I feel I have my ideal environment:
- I just have one technology — Node.js — for the entire back-end.
- As a developer I don’t have to worry about concurrency. That’s all handled for me by the QEWD/ewd-qoper8 master process which is just a “black box” that handles the external-facing HTTP and WebSocket interface as far as I’m concerned. My code will be executed in an isolated Node.js run-time container that has its entire process to itself, so I don’t need to worry about blocking I/O or CPU intensive processing.
- I can and still do use asynchronous APIs, but only where it makes sense and is more efficient to do so. But for most of the time I can access resources such as databases synchronously, which makes my logic simpler, more intuitive and therefore more maintainable.
Yes, I like to think you can now have your Node.js cake and eat it too.
*A special thank you to technical editor Simeon Vincent for reviewing this post.