Optimising Node.js for Speed: Part 1
I’ve been working on a little toy project for a while, quite a while, since 2009 and in very occasional fits and bursts. I used it as a way to learn JavaScript and to learn about ray tracing. One thing I enjoy is maximising the performance of a particular algorithm using a given language.
This post is part 1 of 3 that will describe the the process I used for improving the performance of this particular ray tracing algorithm, in JavaScript. My goal was to make a real-time ray tracer, but also to learn about the JavaScript compiler, namely the V8 JavaScript engine.
Here is the ray tracer for this week, it will get much faster over the next two weeks.
Ray Tracing
Ray tracing is a method for rendering scenes that can result in photo-realistic images. Ray tracing is very computationally expensive and is generally not used in real-time. This video below is a great high-level introduction into ray tracing and it compares ray tracing against rasterisation, which is the most common method of rendering used and is generally what WebGL, OpenGL and DirectX use.
The video mentions that it takes 16 hours to render one frame. So how do you make a real time ray tracer? You make it much simpler, like the one below.
My JavaScript ray tracer has the following features:
- Objects: spheres and discs and be added to the scene. Each object can have different colours. The image above has 3 spheres and 1 disc (the ground plane).
- Lights: there can be 1 or more lights.
- Transparency: objects can be transparent, like glass. Example, you can see through the bottom left sphere.
- Reflections: each object can have a surface that reflects lights. You can notice reflections on the checkered ground, as well as on each sphere. You can even notice the reflections on the silver sphere that are multiple reflections deep, both for the ground and the red sphere.
The Actual Ray Tracer
You can view the baseline (zero-ith version) here, it’s pretty slow. Although in the current version of Firefox it’s twice as fast as the current version of Chrome.
The source code for is available here on GitHub. To get the code for this post you can run the following:
git clone https://github.com/psiphi75/rtjsrt
cd rtjsrt
git reset --hard 1007576ac3246987fb539de5daa3850837f3d32f
The Performance Improvement Process
The steps you should take to improve performance are:
- Ensure your code is bug-free
- Test using a benchmark
- Identify a bottleneck
- Update the code
- Repeat steps 1. to 5. until you are satisfied
I will use the Chrome browser for visualising the images and Node.js for creating repeatable benchmarks. The focus will be on improving speed and less so on memory reduction or code reduction.
1. Start with bug-free code
Donald Knuth said it quite well:
premature optimization is the root of all evil.
Performance optimisation of code should be one of the last steps of the development process. The most important thing is to have code that is well structured and there are no bugs. If you begin performance optimisation with bugs in your code, it’s likely those bugs become worse. Complete tests must be be run after every change to your code to ensure you have not introduced any new bugs.
I made sure my ray-tracer was completely finished before I began optimising it. There were two methods of testing:
- A basic test suite to ensure core functions work.
- Browser rendering and tested visually.
2. Benchmarking
A short digression. I used to work at a large corporate in Switzerland and I was responsible for a web-based application used by many users within the company. There was one server in Zürich and 2,000 users all around the world. The users were non-technical and one particular user, a very vocal one from the UK always complained about performance problems, particularly for the reporting component. So we hired specialists to improve the speed of the reporting system. When that was complete the complaints kept coming in, still from the same user. Eventually during a heated phone conference we discovered that what he meant by ‘performance’ was different than what I understood as ‘performance’. He meant the system was unstable and had other issues, while I was thinking about the speed of the system. He was right. This was a big lesson for me. You need to understand what’s important to the end-user and you need measures in place that the end-user understands to be able to improve performance.
To be able to improve performance you need to know what performance means. In my case it’s Frames Per Second (FPS). It is the measure of how many frames can be rendered in one second. FPS is a standard measure people understand well. It is also easily quantifiable.
The core benchmark was written in Node.js and would loop the render function, rt.render()
, continuously for 60 seconds. During the benchmark it’s important your computer does not do anything else. In fact you’re best doing this without any other applications open.
You can run the code using:
node run-server.js
Once completed it printed:
frames: 101
So in 60 seconds it rendered 101 frames, or an average of 1.83 FPS. I considered this the baseline benchmark. Any increase in the 101 frames in 60 seconds can be considered an improvement.
3. Identify the Bottleneck
A bottleneck is one or multiple piece(s) of code that significantly and negatively impact the speed of the application. Correctly identifying bottlenecks is the key part to optimisation.
All code has bottlenecks, otherwise operations would complete in zero time. There may be many bottlenecks in your application, some will be easy to identify and improve upon. There are numerous ways to identify bottlenecks, these include timers in your code, profilers and 3rd party applications.
Timers within your code
The most basic way to identify how long a function takes to execute is a simple timer. A good method to use is performance.now()
. This is native in the browser and available on npm for Node.js. Using performance.now()
is better than using new Date().getTime()
because it is far more accurate.
The issue with timers in your code is that it’s not that easy to implement on every function. Hence if you only implement them on one or a few functions, you don’t get the whole picture.
Profilers
Profilers are, in my opinion, the most convenient way of gathering performance information. Chrome comes with a profiler and so does Node.js.
So lets run Node.js with the profiler. I used the v8-profiler from npm, you change your code to include the profiler which you can turn on and off at different times. You start it using profiler.startProfiling()
and finish it using profiler.stopProfiling()
, then you export the profile to a JSON. This gist shows it in action, it will create a file called profile.cpuprofile
.
Profiling may significantly impact the performance of your application, so avoid using it continuously in production. The way the profiler works that every millisecond it will interrupt the code check what it’s doing. This means that it’s not an exact measurement of your code because it’s not taking continuous measurements, but it’s good enough.
The profile.cpuprofile
file does not contain a lot of human readable information. However, you can use Chrome Developer tools (press Ctrl-Shift-I
in Chrome) to view the profile recording. Then go to the ‘Profiles’ menu and click ‘Load’. Once loaded the profile will appear on the right, see screenshot.
By default the ‘Chart’ should appear, see below. It’s called a flame chart, I would call it an upside-down-flame-chart. The colours are apparently ‘random’. This give you a good overview of which functions are being used. Here I can see peaks for every frame rendered.
You can zoom in on the timeline to get a good look to see what is happening during the selected period.
Although the chart looks pretty, in the case of the ray tracer it does not give me much useful information, because everything is generally uniform. However, I find the ‘Heavy (Bottom Up)’ setting the best. You can see that vector.dot
uses 23.17% of ‘Self Time’ as well as ‘Total Time’. The ‘Self Time’ shows how much time is spent executing the code only within the particular function. The ‘Total Time’ shows how much time is spent in the function as well as all functions called from that function.
When you drill down into vector.dot
you can see that around 80% of the time is spent in Sphere.intersect
, this makes sense because there are 3 sphere and 1 disc and many vector.dot
calls in the sphere intersect function. Looking at the vector.dot
function (below) you can is a very simple function and I didn’t see how to improve it’s performance, straight away. Maybe it can’t be optimised further!?
Enter IRHydra²
IRHydra² is a simple web-based application that takes trace information from Node.js and provides you with detailed path the V8 compiler took during it’s optimisation journey. This tool requires some knowledge of how the V8 runs and I have to admit I didn’t understand all the information it was providing me.
First you will need to run Node.js with various tracing options enabled. Lets run the following.
node --trace-hydrogen \
--trace-phase=Z \
--trace-deopt \
--code-comments \
--hydrogen-track-positions \
--redirect-code-traces \
--redirect-code-traces-to=code.asm \
run-server.js
This will output two files, code.asm
and hydrogen-12345–1.cfg
(where ‘12345
’ is some random number). We can load that into IRHydra² (click the open file in the very top left of the page). It can display your code as the IR (Intermediate Representation) which is a small step away from assembly, the execution graph or show your actual code.
However, IRHydra² did not give me any useful details about the code compilation. What this implies is that the code execution was as efficient as it gets, in theory. More on IRHyrda² in a later post.
4. Update the code
Below are some of the steps I took to optimise vector
before I found the successful optimisation. Remember the baseline benchmark started at 1.68 FPS (here is the baseline sample):
- Tried using
new Float32Array()
instead of[]
for storing the vector variables. I got 0.133 FPS (sic), a huge regression. This surprised me a lot. I was expecting that a typed array would be much quicker than an untyped array. So I checked out jsperf.com and the vector operations test which validated my findings. Firefox was similar, but not as extreme. - How about just
new Array()
instead of[]
? Even worse, 0.067 FPS. - I rolled back the changes and reduced the number of occurrences of
vector.dot
. I got an 11% improvement, we are at 1.87 FPS. - Upgrading Node.js from 4.2 to 6.9.5 gave a very minor improvement. We are at 1.88 FPS.
- I changed the vector class to SIMD operations. SIMD (say “sim-dee”) stands for Single Instruction Multiple Data, and it means you can add 4 numbers with one operation. It’s very low level operation and should be very quick, however, I got only 0.58 FPS. Currently Firefox Nightly supports optimised SIMD, for Chrome and Node.js it’s not yet optimised (read here). Check out this example, it shows a 2 to 3 improvement with SIMD in Firefox Nightly. I rolled back the SIMD changes.
- ‘Enhanced’ some of the vector math. 1.78 FPS. Rollback.
- Use
const
andlet
instead ofvar
. These ES6 statements in theory give give the compiler a better optimisation strategy, right!? I got 1.43 FPS. Rolled back. More on ES6 in my next post. - Do some pre-calculations. 1.83 FPS. Rollback.
- Different render strategy. 1.77 FPS. Rollback. Although later I optimise this and get better performance out of it.
So far I had spent a significant amount of time to mainly improve the speed of the vector
functions. However, I tried a new pattern for the vector
functions and used the following classical prototypical model:
This almost doubled the performance, I got 3.53 FPS. This was a great improvement and I was surprised. Here is the working sample at 3.53 FPS on my laptop.
Conclusion … for now
JavaScript optimisation is more difficult than I was expecting. I had a lot of assumptions about the compiler and how the compiler works. You really need to understand how the compiler works internally to make good progress with optimisations.
Next week I will post more about my journey with optimising the ray tracer.