What I’ve learned building a browser app in WebGL

9 min readMay 21, 2018

Over the last months I’ve been working on a soon-to-be-released app called Arcentry that lets users create isometric diagrams of backend and cloud architectures.

Apart from some conventional controls written in VueJS, Arcentry is first and foremost a large, interactive canvas on which users can place, move, edit and connect objects, lines, areas, labels, icons and all the other bits and bobs that make up modern architectures.

For this canvas we used WebGL, a browser based implementation of the OpenGL 3D standard — and boy, were we in for a ride.

Why WebGL?

Modern browsers make building webapps with CSS and HTML extremely easy — a fact that I’ve come to appreciate even more since working on Arcentry. Want this element to be red? Have rounded corners? Change color when hovered? Animate that color-change? Attach a click handler? No problem at all! And you don’t face a care in the world about how and when your HTML will be rendered to the screen.

With WebGL you have to take care of all these things and more yourself. So why would anyone in their right mind use it to build a complex webapp? For three reasons:

There are some things HTML just can’t do. Real 3D graphics is one of them. Granted, CSS gives you “3D” transforms and some mad geniuses stretched it as far as building fully functioning first person shooters, but it is still a very humble take compared to full polygon-models and raytracing.
WebGL is extremely flexible. Need depth of field effects, color blending or the same scenario rendered multiple times in different resolutions and angles? WebGL can.
WebGL is fast. Really fast. 2018 graphic cards have become absolute wonders of engineering and WebGL offloads expensive object computations and rendering to them whilst also freeing up the CPU to run your application logic.

What I learned

I discovered the basics of 3D development working at Crytek (maker of Crysis and early Farcry games) and also dabbled a bit with WebGL for a Chrome experiment, but my expertise is first and foremost in backend technologies and HTML apps. As a result this blogpost documents my learnings from a webapp-developer’s perspective — if you’ve spent your life building triple-A FPS games, chances are you’ll find this information rather basic and self-evident, but if you’re a web-guy thinking about going 3D, it might offer the right perspective.

Let’s start at the beginning:

1. You’ll have to choose a framework Russian election style

The first choice when building any app is usually the one of language and framework. For WebApps you’re faced with an ecosystem that can be described as anything from saturated to fragmented to convoluted: Angular, React, Vue and a myriad of smaller libraries can be combined with your choice of CSS compiler, transpiler, microlibraries and a host of other things.

For WebGL things are easier: You use ThreeJS. Yes, there are alternatives such as Babylon and more purpose-built high-level libraries such as A-Frame (WebVR,), PlayCanvas (Games) or Pixi (2D Graphics), but Three stands out as the by far most widely adopted, general purpose and universal 3D tool at hand.

If you’ve ever used 3D modelling software you’ll be immediately familiar with Three’s concepts. Just as you build up a DOM-tree in the browser out of various HTML elements in Three you build up a “scenegraph” out of various 3D objects. These can be concrete and visible such as Meshes (3D Objects composed from Geometry and Material), lights and maps or abstract such as groups, cameras or curves.

2. How to build your models on a budget

There are ThreeJS exporters for many popular 3D Modelling tools such as Maya or Cinema 4D, but the one we went with was Blender. Is it particularly intuitive or easy to use? Absolutely not! I had to watch a three minute tutorial just to find out how to close a panel.

But it’s well documented, comes with a host of community plugins and most importantly: it is free! In a category where most established alternatives (Autodesk Maya, 3D Studio Max, Cinema 4D) cost upwards of $2000/year that’s nothing to scoff at.

Three comes with an easy to use exporter for Blender that spits out JSON files listing the various vertices, edges and faces of a geometry.

model definition for the above AWS snowball model

3. Interactions are all up to you

WebGL renders your scene to a flat pixel canvas. As far as the browser is concerned when you click that canvas all you’ve clicked is an image. To relate your 2D cursor position to a 3D object within your virtual space you need a “ray” — an abstract line that extends from your camera infinitely into the space beyond and intersects objects along the way.

But even armed with such a ray and your raycasters all set up, you begin to realise just how much the browser gives you on top:

Your ray intersects all objects along its path — its up to you to figure out which one is on top, which one is clickable and so on — the sort of thing that z-index and pointer-events provide in the DOM.
Your events exists independent of any object hierarchies. Event bubbling, stopping propagation and routing your event to the right handler are all up to you.

4. Crisp dynamic textures are really hard

ThreeJS allows you to project the pixel-content of an HTML5 Canvas Element onto the face of a 3D object as a texture — an approach that Arcentry makes extensive use of for things like lines, arrows, areas, icons and so on.

But there’s a catch. Your canvas is a 2D surface composed of pixels. These need to be projected onto a surface in 3D space — a process called “texture-mapping”. But how many pixels should you map to one “texel” of a texture? That depends on two factors:

The size of the textured surface in the final rendered image. If it is far away from the camera you can get away with lower resolutions.
How many pixels your users hardware can support. Creating and mapping high-res dynamic textures is expensive.

Arcentry supports a huge drawing plane of 1000x1000 cells. At 128px per cell this would have meant we needed a canvas with more than 16 billion pixels as a basis for our 2D shape texture. Clearly out of the question. (Tests showed that anything greater than 4096x4096 just instantly crashes the browser.

We solved it by creating the smallest possible plane — a 60 degree rotated square that reaches just up to the corners of the visible areas.

This works beautifully, but is surprisingly tricky. Why? The square basically sticks to the camera which means that every time the user pans or zooms the view, the square has to move and resize accordingly.

It also means that our canvas has to keep a resolution independent log of all drawing steps to replicate them upon zooming — but the result seems to justify the additional complexity.

5. Rendering text is even harder

Problems associated with crisp rendering might not be apparent for large, monochrome areas — but for text they are. By default, text projected onto a surface looks blurry and comes with greyish edges. Getting it right required three steps:

Using Anti-Aliasing: This is the technique of rendering a larger image than required and scaling it down. Multiple pixels are interpolated in the process, resulting in smoother edges.
Increasing Anisotropy: Simply put, Anisotropic Filtering is the extend to which rendered surfaces are affected by the angle they are rendered at. It allows to decrease pixel density as an object becomes steeper relative to a 3D view (there’s a good chance that any computer science professor will put a fat, red F under this definition). Long story short, for an isometric app like Arcentry you want to crank up Anisotropy to the max.
Using custom blending: Last but not least your canvas texture will have transparent or semi transparent pixels (e.g. everything that’s not text). Blending impacts how these will be mixed into the underlying image. ThreeJS default alpha blending can be a bit rough and leave grey borders around the edges of transparent areas — but setting it to custom blending with a simple, solid blend source factor of 1 works wonders.

6. You decide what to render — and when

Chrome showing repainting rectangles for Arcentry’s HTML controls

Have a look at the GIF above. It shows Chrome’s repainting areas for HTML controls. As you can see it only repaints what’s absolutely necessary.

Do we need to achieve the same in a WebGL app? Not necessarily. Rendering might be an expensive operation, but GPUs have gotten so fast that simple, low polygon and low-effect scenes like Arcentry’s aren’t much of a challenge anymore. As a result its mostly fine to render them on every animation frame, usually at around 60 frames per second.

But that’s not the whole story. Yes, GPUs are fast, but copying information to them is not. Whenever an object is moved, a color is changed or the camera is zoomed, a bit of information is passed from the CPU/Memory to the GPU.

To keep performance high you want to do this as little and as efficiently as possible. Especially copying large pixel buffers, such as the image data from a dynamic texture canvas costs you dearly.

For Arcentry we tackled this in a number of ways. We only load texture data that’s changed, we break texture planes down into multiple layers — one for lines and areas, one for pixel objects such as labels, icons or images and a frequently updated plane for interactions, such as hover markers or selection rectangles and we detached the actual rendering from the underlying objects, allowing the renderer to perform numerous dirty/changed checks and to skip if nothing needs updating.

7. You decide what scale it is

WebGL uses a three-axis Cartesian coordinate system; x, y and z axis extend from a point 0,0,0 in the center to + and — infinity. Where is this 0,0,0 point? Doesn’t matter. How much is one unit on an axis? That’s up to you.

There’s a philosophical beauty to being confronted with navigating in an empty, infinite space — but for realworld reasons you need to decide on an arbitrary definition of an unit and stick with it as the future baseline for all other measures: texture resolution (in pixels per unit), model scale (blender uses its own, separate coordinate system) and so on.

Conclusion

Phew, that sounds like a lot of headache — and it is. Would we choose WebGL again? Absolutely. Not only are the results truly worth it — there’s also a learning angle that makes every bit of progress feel deeply satisfying. Once you discover the right combination of arcane concepts like antialiasing, anisotropy and custom blending, and your visuals finally look crisp and appealing, you feel like you’ve worked a bit of black magic — way more so than just turning on some CSS drop-shadow and letting the browser do the hard work.

Thanks for holding out with me for so long. If you’d like to try out Arcentry you can sign up for early beta access at https://arcentry.com, follow it on Twitter (https://twitter.com/arcentry) or follow me to learn more (https://twitter.com/wolframhempel).