Making of Newton Protocol

I recently participated in the PC 4k intro category in the Revision 2019 demoscene competition with my entry “Newton Protocol” and placed 1st. I was responsible for the intro’s coding and graphics, while dixan composed the music for the intro. The basic rule of the competition is that you must create an executable or a website that is only 4096 bytes in size. This means that everything must be generated with math and algorithms; you cannot otherwise fit images, video or audio files into such a small memory space. In this article I go over the rendering pipeline of my intro, Newton Protocol. You can view the end result below or click here to see how it looked live at Revision, or visit pouet to comment on and download the final entry. For competing entries and other stuff revision click here.

Ray marching distance fields are a very common technique in 4k intros as these enable the definition of complex shapes with very few lines of code. The downside, however, is the performance of the code. To render your scene you must find intersection point with rays and your scene, first to figure out what you see i.e. ray from camera and then subsequent rays to lights from the object to compute lighting. In ray marching you don’t find these intersections in single step, but rather you take several small steps along the ray and have to evaluate all objects at each point. With ray tracing on the other hand, you find the exact intersection by testing each object only once, but you are very limited in what shapes you can make, since you must have a formula for each to calculate intersection with a ray.

In this intro I wanted to simulate very accurate lighting. As this requires bouncing millions of rays around the scene, ray tracing seemed like a good approach to achieve this effect. I limited myself to only a single shape — a sphere — because ray-sphere intersection is relatively simple to calculate. Even the walls of the intro are actually just very large spheres. This also made the physics simulation simple; there was only collisions between spheres to consider.

To illustrate of the amount of code that can fit into 4096 bytes, below is the full source code of the final intro. All the parts except for the HTML at the bottom is encoded as a PNG image to compress this into a smaller space. Without this compression, the code would otherwise be nearly 8900 bytes in size. The synth is stripped-down version of SoundBox. For packing the code into this minimized format, I used Google’s Closure Compiler and Shader Minifier. Finally nearly everything is compressed into PNG with JsExe. The source code of my previous 4k intro, Core Critical, can be referenced for the full compilation pipeline as it is identical to the one used here.

The music and synth were implemented purely in Javascript. The WebGL portion is split into two parts (highlighted in green) and setups the rendering pipeline. The physics and ray tracer elements are GLSL shaders. While the rest of the code is encoded into a PNG image, the HTML part is added to the bottom of resulting image file as is. Browser ignores the image data and only runs the HTML code which in turn decodes the PNG back to javascript, and executes it.

Rendering pipeline

The figure below illustrates the rendering pipeline. There are two main parts in the pipeline. The first part of the pipeline is the physics simulator. The scene of the intro has 50 spheres bouncing around inside a box. The box itself is made out of six spheres, some smaller than the others to get more curved walls. The two vertical the lights in the corners are a sphere each, for a total of 58 spheres. The second part is the ray tracer that renders the scene. The graph below shows rendering one frame at the time t. The physics simulation takes previous frame (t-1) and simulates the current state. The ray tracer takes the positions now and in previous frame (for velocity channel) to render out the scene. Then post processing combines previous 5 frames and current frame to reduce aliasing and noise to produce the final output.

Rendering frame at time t.

The physics part is fairly simple, you can find many tutorials online to create a primitive simulation for spheres. Position, radius, velocity and mass are stored into two 1 x 58 resolution textures. I’m taking advantage of Webgl 2 feature to render out to several render targets, so both textures are written out simultaneously. This is also used by ray tracer to produce 3 texture outputs. Webgl doesn’t offer any access to NVidia RTX or DirectX Raytracing (DXR) ray tracing APIs, so everything is done from scratch.

Ray tracer

Ray tracing itself is a fairly primitive technique. You shoot a ray into the scene, it bounces 4 times and if it hits a light the color from the bounces is accumulated, and if not, then the resulting color is black. There is no room in 4096 bytes (which includes music, synth, physics and rendering) to create fancy ray tracing acceleration structures. Thus we use brute force method i.e. testing all 57 (front wall is excluded) spheres for every ray without any optimizations to exclude some spheres. This means that only 2–6 rays or samples per pixel can be shot while maintaining 60 frames per second at 1080p. This is not nearly enough to produce smooth lighting.

1 sample per pixel.
6 samples per pixel.

So, how can this be fixed? The ray tracing algorithm was investigated first, but it was already nearly as simple as it could possibly be. I did manage to get slightly improved performance by removing cases where the ray starts inside a sphere, as these cases would only be valid for transparency effects and the intro scene only contained opaque objects. After this, I combined every if condition into a single statement at the end to prevent unnecessary branching: despite doing “extra” calculations this approach was still faster than multiple if statements. Another thing that could have been done was improving the sampling pattern: rather than shooting rays at random, we can instead distribute them in a more uniform pattern around the scene. Unfortunately, this didn’t work out well and caused wavy artifacts with every algorithm that was tried. This approach, however, had good results for producing a still image. I ended up falling back to utilizing a completely random distribution.

Since nearby pixels should have very similar lighting, why not utilize them when computing a single pixel’s lighting? We don’t want to blur the textures — just the lighting — so we need to render them in separate channels. We also don’t want to blur objects, so we need to render out object IDs in order to know which pixels can safely blurred together. Since we have reflective objects and require sharp reflections, it’s not enough to just have the object ID of the first object we hit. I used a special case for clear reflective materials to also include first and second objects ID visible in reflections to object ID channel. In this way, blurring could smooth out lighting in objects seen in reflections while respecting object boundaries.

Texture channel, we don’t want to blur this.
Here the red channel has the 1st object ID, with green having the 2nd, and blue, the 3rd. In practice these were all encoded into single float with the integer part being object IDs and the decimal part representing the roughness: 332211.RR.

Since I have objects with varying roughness (some spheres are blurry, some are completely diffuse and others are clear) I stored the roughness in order to control the radius of the blurring. Because the scene doesn’t have any fine details, I used a large 50 x 50 kernel with inverse square weighting for blurring. It doesn’t take account world space, which we could do to get more accurate results, since it blurs bigger area on angled surfaces in some directions. This blurring already produces a somewhat smooth image, but it still has very visible artifacts, especially in motion.

Lighting channel with blurring and still-visible artifacts. Blurred dots in the back wall, as caused by a small bug I missed with 2nd bounce object IDs when rays escape the scene, can be seen in this image. This doesn’t show up much in the final image, because the sharp reflections come from the texture channel. The lights also became blurred, but I liked the effect and left it in. The object IDs could be altered, based on the material, to prevent this if so desired.

When objects in a scene and the camera capturing the scene move slowly, lighting should remain consistent between frames as well. As such, we can blur not just in the screen’s XY dimensions; we can blur in the time dimension as well. If we assume the lighting doesn’t change much in a 100 ms time period, we can average it over 6 frames. Now, the objects and camera still move some distance during that time frame, so simply calculating the average out of 6 frame would produce a very blurry image. But since we know where all the objects and the camera were in the previous frame, we can calculate velocity vectors in screen space. This is called temporal reprojection. If I have a pixel at time t, I can take the velocity of that pixel and calculate where it was in t-1, then calculate where the pixel at t-1 was at t-2, and so on and so forth going back 5 frames. Unlike screen space blurring, I used the same weight for each frame, i.e. simply averaging the color over the frames, for temporal “blurring”.

The pixel velocity channel that tells where the pixel was in last frame based on object and camera movement.
To avoid blurring objects together, we again use the object ID channel. In this case we only consider the first object hit, unlike in the blurring in the screen’s XY space. This provides anti-aliasing within object boundaries, i.e. in reflections.

Of course, the pixel might not have been visible in the previous frame; perhaps it was hidden behind another object or outside the camera’s view. In these cases we cannot use past information. This check is done separately for each frame, so we can get anywhere between 1 to 6 samples or frames per pixel and use what we can. The picture below shows that this problem isn’t that big for slow-moving objects.

When objects move to reveal new parts of the scene, we don’t have 6 frames of information to average out for those parts. This picture highlights the areas which have 6 frames (in white) as well as those that do not (in progressively darker shades). The outlines are caused by randomizing sample locations per pixel per frame and taking object ID from first sample.
Blurred lighting was averaged over six frames. The artifacts are nearly invisible and the result is temporally stable, because only 1 frame out of 6 changes each frame where lighting is concerned.

Combining all this together, we get the final image. The lighting is blurred over nearby pixels while keeping textures and reflections sharp. Then all that is averaged over six frames to produce an even smoother and temporally stable image.

Final image.

Some ghosting artifacts are still visible because I averaged out multiple samples per pixel despite only taking the object ID and velocity channel from the first hit. You could try fixing this by rejecting samples if they aren’t same as the first, or at least if the first hit isn’t the same in order to gain anti-aliasing in reflections. In practice, trailing is nearly invisible so I didn’t bother fixing it. The object boundaries are also aliased, as velocity and object ID channels cannot be anti-aliased. I did consider rendering everything in 2160p then downscaling to 1080p, but my NVidia GTX 980ti couldn’t handle those resolutions while maintaining 60fps, so I decided against it.

Overall I’m very happy how the entry turned out. I managed to squeeze in everything I intended and despite minor bugs, the end result is very solid. Fixing the bugs and improving anti-aliasing leaves room for improve in future. Extra features such as transparency, motion blur, different shapes and object transformations, to name a few, are also something to experiment with in the future.