Profiling slow Node.js apps using v8-profiler

Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient.

I’ve been writing Node.js code and services from the early days which started around 6 years ago. I have a long background in C/C++ but writing HTTP services, at that time, in C/C++ was a rare thing to do as there was no sense of modules (C++20 where are you?), no easy deployment, no support for HTTP in the standard library, etc…

After a few years of using PHP I started looking for an language that would be easy to write and deploy, but also to be fast.
Then I came upon Node.js. At the Node.js homepage were ~10 lines of Javascript code making up an HTTP server. (Updated code can be found these days at https://nodejs.org/en/about). 
Now this looked like something. And it says right there: “lightweight and efficient”. As I was also writing Javascript within frontend apps (HTML and later QML), this seemed easy enough.

I liked the fact that I could write almost anything in Node.js without the need for third party libraries, as core modules had everything I needed (and at that time there wasn’t so many third party modules) and JavaScript is arguably much simpler than C++. After being with Node.js for years and using it from simple scripts to complex distributed systems I was started to get more involved into optimizations and profiling. And whenever there was somebody contacting me having an issue with Node.js this is what I heard frequently:

“Why is Node.js so slow? We would be much better of with a xy language.”

Node.js

Node.js architecture ( https://twitter.com/rauschma/status/496213393146404864)

Node.js is an interesting combination of technologies. 
It uses a libuv library to do majority of the work and then, by using a V8 engine, gives you nice non-blocking asynchronous interfaces that you can easily call from JavaScript code.

It’s good to know that all JavaScript code is executed on one thread which is responsible for processing all event loop logic. This you mostly know as some kind of a callback. So if you write complex and slow JavaScript code in one of your callbacks, you will slow down processing of all of your callbacks.

But we also know that Node.js does use threads internally. It’s just that we do not handle these directly from our JavaScript code. They come from using libuv where a worker thread pool is used to manage operations which do not have non-blocking asynchronous api by default. I would recommend to read more about this at Node.js docs, especially blocking vs non-blocking.

You can control a size of the worker pool by env variable UV_THREADPOOL_SIZE which has a value of 4 by default but we will leave details about this for another article.

A problematic service

The one thing that is common in the projects that I have worked with is how fast they were developed. Node.js made it very easy for people to just focus on the task at hand.

You need an HTTP service? Here you go with express. You need a logger? Here you go with winston. You need to do some requests? Here you go with request. All of them are popular, have many features you might need and everybody is using them. Just one package.json line and you are npm install away from using this in production. You can go even higher and slap a framework that uses all of them under the hood and suddenly everyone in the team is happy how fast you have developed a new service and that you migrated from java.

Now that is the essence of something that I had to investigate and resolve and that I had a fun with.

The service in question did a lot HLS m3u8 manipulation so it was downloading a lot of m3u8 files in real-time. Everything seemed fine but when traffic started increasing it became obvious that it needs more and more hardware. If you have some kind of auto-scaling you can nicely see what a difference of an optimized code can make.

Their team has taken a quick look and concluded that the code they wrote is fine and that, well… the issue is obviously in the Node.js itself.

Go to the rescue!

Errr, no. We still have to find the culprit no matter if the 10 line Go server is faster.

V8

The V8 engine used by Chrome and Node.js is a very optimized JavaScript engine.

The engine is being improved constantly and if you like a little technical reading I would highly recommend their official blog.

I’m already testing Node v10 using newer V8 engine and a new HTTP/2 module which shows nice improvements!

Profiling

One way that I like is using the v8-profiler to generate sunburst charts since it works on multiple platforms.

I have documented this process slightly and provided some helper scripts that you can find at my GitHub where you can also find alternative instructions on profiling with Linux perf events and generating flamegraphs.

The service used a winston console and a file output which was then used together with request to be something in the essence of:

// winston
new winston.Logger({
transports: [
new winston.transports.Console({
json: true
}),
new winston.transports.File({
json: true,
filename: 'app.log'
})
]
})

// request
let start = Date.now()
request.get("http://link")
.pipe(fs.createWriteStream("outfile")
.on('finish', () => {
logger.info(`request done in ${Date.now() - start}ms`)
})

Now to profile the app we use v8-profiler module. You can just add something like this at the end:

const profiler = require("v8-profiler")
const id = Date.now() + ".profile"
// start profiling
profiler.startProfiling(id)
// stop profiling in n seconds and exit
setTimeout(() => {
const profile = JSON.stringify(profiler.stopProfiling(id))
    fs.writeFile(`${__dirname}/${id}`, profile, () => {
console.log("profiling done, written to:", id)
process.exit() // if you want
})
}, 15000)
The article tests were done using latest Node.js LTS at this time (8.11.3)

After running the program and profiling has finished, you can find the profile file in the directory your program was running and from that you can create the sunburst chart.

Add profile.html which you can also get from my Github to the same location where your profile file is and open it in browser like file:///tmp/profile.html?1528534695870.profile (you might need to give your browser permission to be able to read local files or you can just serve this using any HTTP server)

v8-profiler sunburst (winston console & file + request)

So our sunburst looks like there is a lot going on.

Since v8-profiler is profiling V8 code and file writing is done by the libuv, v8-profiler doesn’t actually show all the work done by the whole program, which is exactly what we want, because we expect there is an issue in the user code.

Now investigating the graph I could see there is a lot of winston and request module usage, which is expected, but upon closer look it was obvious that there is some internal code that might be doing too much.

Request

A lot of the time when doing a request is spent in creating the request object itself. We can also spot things like multipart code using uuid module which was not needed for our GET requests. Keepalive is also not enabled so new connections are being spawned for every request which is very expensive when doing a lot requests. URL parsing logic is also not the fastest (this is Node.js internal module) which you can read more about here https://stackoverflow.com/questions/20770003/performance-about-url-parse-in-nodejs.

Winston

When looking at the winston usage it is obvious that a lot of time is being spent when logging to the file. Now, in my experience, one thing you should not be doing is sending logs to standard output/error and also using some other transport like file. 
Logging (in Node.js) should be always done by using the standard output/error and this should then be processed and routed from the rest of your stack (which you can then use to easily redirect logs to a file as well). This also avoids killing your single threaded Node.js application by doing extra logging and is the fastest way to do logs in Node.js when you do it correctly (https://nodejs.org/api/process.html#process_a_note_on_process_i_o). Using a file logging from Node.js is async but that then affects your other I/O operations that use the same worker pool. Once you have your stack setup in this way you can easily swap the services even when they are written in different languages since every app can log to standard output/error and you do not have to reinvent the wheel for the log transport you now need to use by possibly using a third service library which is of questionable quality. Your stack, maybe running on something like AWS and/or using a docker, can then send the logs using different transports all without disrupting your app.

So let’s see what we get when removing a winston file output and enabling keepalive on requests.

v8-profiler sunburst (winston console + request keepalive)

We can see massive improvement and in production this was in the range of 25–35%. But we still see a massive amount of request work. On the winston side thing are looking really good. However we can now replace that with pino since we do not need any extra features and use a native HTTP request module which is almost a drop in replacement:

const http = require('http')
const logger = require('pino')()
http.get("http://link", (rep) => {
rep.pipe(fs.createWriteStream(...))
...
v8-profiler sunburst (pino + http keepalive)

Now this again shows a big difference in a profiler. Actual code has shown a consistent improvement around 5% when replacing winston with pino and another 35% improvement when replacing request with a native HTTP/S module.

Wrapping up

Now we did get a lot of improvement from the early version. But we also need to make sure we haven’t lost any features. If there are missing features we have to develop these and then make sure we haven’t slowed down our app just the same.

One thing we can’t conclude is that swapping out one module for another module will bring you significant benefits, just because you found one scenario where this is the case, like this one. 
Majority of projects wont see the difference as most of the performance hit comes from the network and disk I/O. If you add micro-services into the mix, biggest issues come from the decisions you make for the whole architecture. The point being, you should spend the time to know your frameworks given your requirements.

When it comes to specific requirements there are great libraries like pino and needle for those in a need of a logger and an request library. There is also a micro web framework which, in my opinion, is great for micro-service environment. But unless that service is heavily used as mentioned before, you probably won’t spot a difference and you might be missing features which you will need to implement yourself. Other times you need to get the right tool for the job, even if that means using some other language instead.

I have had a lot of experience with optimizing and profiling code and the only thing we can be sure of is that this is a great way to get to know you program and improve your code. If you get to the point that you think Node.js is slow, now you know how to find where the bottleneck is.