Eureka Engineering
Published in

Eureka Engineering

Image Processing with WebGL

Photo by Pawel Czerwinski on Unsplash

This is the December 9 article for Eureka Advent Calendar 2021.

Native applications have set a high standard for image editing and processing features that users have come to know and love. With features ranging from simple cropping, to photo filters and decoration.

With the exception of a few standout services, the standard for image editing on the web is unfortunately still pretty poor. In even some of the most popular applications and social networks, tools are either very limited or non-existent. Users are expected to revert to native counterparts to process and crop images before uploading to a web service.

As a keen photographer and web engineer, I have made it my challenge to create a portable image cropping and filtering library which is able to achieve native-like performance entirely on the client.

This has been a long and interesting journey over the course of a year that taught me a lot about 2D canvas rendering all the way down to low-level use of WebGL and shaders.

If any of this sounds interesting to you, read on and I’ll show you everything I wish I knew a year ago!

ImageData interface

Throughout this article, we will be dealing with images via the ImageData interface. It’s perhaps the simplest and most understandable way of representing an image, whilst also being very easy to manipulate.

I’ll skip the exact details of how to get an image as ImageData, but the simplest way involves creating an <img> tag to fetch image data, drawing that to a canvas and calling the getImageData() method on the canvas context.

The ImageData interface stores the image’s width, height and a UInt8ClampedArray of pixel data in a repeating RGBA sequence where each channel is represented as a number between 0 and 255. If that sounds familiar, that’s because it is — you’re probably used to writing colours as hexadecimal, where #ffffff means an RGB value of 255, 255, 255.
For all of the examples given in this article, I will be ignoring the alpha channel as I’m assuming our images are fully opaque. Also note that the pixel data in the ImageData interface is ordered top-left to bottom-right, matching the coordinate system of the browser . This will become a key piece of information later on.

Image processing with HTML Canvas

Before delving into WebGL rendering, I think it’s worth briefly covering 2D canvas. The 2D canvas APIs are well established and widely supported. I’d also used it a number of times before so this was a good starting point for my research.

To process an image based on a set of input parameters (e.g. brightness, contrast, saturation etc.), we need to take the red, green and blue channels of each pixel and perform a transformation which applies the given input parameters. The simplest example of a transform increases the brightness of our input image:

The above code loops over each four-channel pixel and adds the brightness value to each channel before returning a new ImageData instance.

Using this pattern, a number of different filters can be created. An exposure filter multiplies each of the channels by an exponent. The simplest of contrast filters is a combination of the above two addition and multiplication examples.

But after chaining multiple transformations together, the code becomes heavy and definitely won’t run at a smooth frame rate in the browser. One way in which we could optimise this would be to represent each transformation as an identity matrix. If you’ve ever written filters in SVG, you may be familiar with the <feColorMatrix> element which changes colours based on a transformation matrix. The neat thing about identity matrices is that they can be mathematically combined into a single transformation, therefore our loop would only have to run once.

The problem arises when we want to apply more complex filters that can’t be represented as simple transformations. My goal for this project is to replicate all of the filters available in the Google Photos app, where adjusting the color temperature, tint, highlights and shadows requires calculating dot products and converting between colour spaces.
Performing all of these transformations across even a small image quickly blows our 10ms frame time budget.

This leads us on to…

Image processing with WebGL

In order to perform complex transformations whilst maintaining good performance, we’re going to hand off the work to the GPU.

Unlike a device’s CPU which is very good at performing a multitude of tasks, its GPU is much more specialised to perform highly parallelised floating point calculations. With WebGL we can break out of the single-threaded Javascript world and hand off our processing to the hundreds of shader cores inside the GPU.

WebGL is an almost 1:1 wrapper for the OpenGL ES 2.0 API which contains all of the power, and all of the quirks and misdirection that comes from an API that was originally written in the early 1990s. WebGL is quite unlike any of the standard object oriented browser APIs you’re familiar with.

Note: WebGL 2 — being based on OpenGL ES 3.0 — includes a number of additional texture formats, and includes a few browser-specific additions. Everything that follows is compatible with both WebGL and WebGL 2.

In the following section, I will be using terminology that is often used in 3D rendering and the use of WebGL. Below are some of these terms which — in some cases — are overly-simplified, but accurate enough to the context of image processing.

Vertex

GPUs draw shapes from a list of points or vertices which form a triangle. Our list of vertices can be represented in either 2D or 3D space. Any surface that is drawn in WebGL must be made of triangles, regardless of the shape. This will be important later when we come to draw a square on the canvas; we will actually be drawing two triangles.

Texture

A texture is essentially an image file that has been uploaded to the GPU. An uploaded texture can then be read or sampled by a shader when drawing pixels on the screen. There are many more complexities and settings that textures can utilise such as mipmapping, but we will skip over these as they are not important to our work today.

Shader

Shaders are small pieces of code that tell the GPU how to draw each pixel. The two types of shader available in WebGL are vertex shaders and fragment shaders. These are written using GLSL (Graphics Library Shader Language), which you may find very similar to the C language.

A vertex shader takes each of our vertices which are represented in either 2D (x, y) or 3D (x, y, z) space and maps them to our 2D canvas.
These are important in 3D applications where your vertex shader is used to ‘project’ the position of each vertex onto your 2D monitor through use of a projection matrix, creating depth and perspective. This is why games still look 3D when viewed on a 2D monitor.
It’s easiest to imagine that a vertex shader takes a shape and decides which pixels on the screen should be filled. As our image processing library is only dealing with 2D data, ours will be incredibly simple.

A fragment shader (sometimes known as a pixel shader) takes each pixel of our shape and decides what colour it should be. A simple fragment shader would return the same colour for each pixel, colouring the entire surface a solid colour. With this knowledge, you may be able to deduce that our fragment shader will have to sample pixels from a texture to draw our image.

Program

A program in WebGL is a combination of both a vertex shader and a fragment shader that when enabled, tells the GPU how to draw vertices.
This program exposes a series of locations (think memory locations) to which we can pass an ArrayBuffer of vertices, or any other parameter that our compiled shaders will use.

Note: Scene rendering is not strictly limited to a single program; in some cases you may wish to run multiple programs sequentially to build up a scene.

With our simple knowledge of vertices, textures, shaders, and programs we are ready to start using WebGL!

Drawing a Triangle

For graphics programming, drawing a triangle is the equivalent of “Hello World!”.

It requires some shader programming and a fair bit of control code. There’s quite a few parts to it, but we’ll tackle it step by step.

Vertex Shader

In order to draw a triangle on the screen, we must first create a vertex shader to draw our vertices in screen space. Screen space in WebGL is represented as a floating point number from -1.0 to 1.0 .

attribute vec2 position; Defines an attribute on our shader/program. Think of it like a parameter that will receive the 2D coordinate of each of our triangle’s vertices.
The main() routine runs once per vertex and sets gl_Position to our x and y coordinate (feel free to ignore the 0, 1.0 at the end as they are not used for simple 2D drawing).

Fragment Shader

The main() routine of our fragment shader sets gl_FragColor to this lovely shade of magenta for each pixel of the triangle. The colour is represented as a vec4 as it contains all four RGBA channels.

You’ll also notice the precision highp float; definition at the top of the file. This tells our shader to run floating point calculations with high precision.

WebGL Control Code

There’s quite a lot going on here, so let’s break it down:

After creating a WebGL context, we compile our two shaders and attach them to a program. We then tell our WebGL context to use this as our active program.

We create a new Float32Array containing the 3 vertices of our triangle. This follows the format [x1, y1, x2, y2, x3, y3] . This array is then passed to the GPU via bindBuffer() and setting bufferData() . There are a number of parameters here, but I will skip over the details for brevity.

Next we call getAttribLocation() on our active program to find the location of our position attribute we defined in our vertex shader. When our shaders are compiled and linked to a program, our attributes are assigned a location from which we can access them.
With our newly fetched location, we can use our buffer as vertex data using enableVertexAttribArray() and tell WebGL to interpret it as 2D vectors using vertexAttribPointer(); .

Finally, our WebGL context has received all of the data it needs to draw the scene. So we clear the canvas, clear our COLOR_BUFFER_BIT and call drawArrays() , specifying that 3 vertices should be drawn as triangles, from a starting point of 0.

The Result

The result of 57 lines of this confusing and abstract code really is just a pink triangle.

Drawing a Square

Drawing a triangle in WebGL is great, but you may remember that we’re trying to build a photo editing application.
I don’t know about you, but I’ve not personally seen any triangular photos before!

To draw a square, we will need to draw two triangles. To do so, we will add another triangle to our VERTICES array and tell WebGL to draw 6 vertices instead of 3!

The above square will be this set of vertices! 😊

Mapping Colour

Before we try to map a texture to our square, let’s first make some modifications to our shaders.

To each of our shaders we will add a varying vec2 texCoords; . Think of this like a program variable that our vertex shader will write to and then our fragment shader will read from.

Vertex Shader

If you remember from earlier, WebGL’s screen coordinates are represented as a floating point number from -1.0 to 1.0. To make our lives easier, we first convert this range to 0.0 to 1.0 before setting it as the value of our texCoords variable.

Fragment Shader

Instead of returning a solid magenta colour, our fragment shader now uses the texCoords x component as our red channel, y component as our blue channel, and finally our green channel as 1.0.

The Result

With this, we have a lovely 2D gradient and have demonstrated shader variables.

Try reading the fragment shader in detail and imagine how the strength of the red component increases as the x value increases. Likewise, the green component increases as the y value increases.

Using Textures

Now we’ve successfully created a square in our scene, we’re ready to show an image.

Loading ImageData as a Texture

With some more confusing WebGL magic, we’re able to pass an image from Javascript as ImageData to our WebGL context as a texture.

In short, we define a 2D texture in slot TEXTURE0 , set the texture data to our image, using the colour mode RGBA .

Following this, we can set a number of parameters such as wrapping mode, and interpolation mode (MIN_FILTER and MAG_FILTER). Wrapping mode isn’t important for our use case, but you may wish to experiment with the different interpolation modes.

Fragment Shader

We create a sampler2D which will be used to sample colour data from the texture we previously set as the active texture in slot TEXTURE0.

As you remember, our fragment shader’s main() routine runs once for each pixel of the output canvas. So for each pixel, we use the texture2D function to sample the texture at our current coordinate and set this as our gl_FragColor.

The Result

Err, that’s not quite right 🙃

The result is a square with our beautiful texture rendered upside down! You may recall that earlier in the article where I mentioned that browser coordinates run from top-left to bottom-right. For everyone’s sanity, Browser APIs in Javascript represent image data in the same order.

Because WebGL coordinates are bottom-left to top-right, our image data is sampled with the y-axis inverted!

This is fortunately a simple fix, by inverting texCoords.y in our vertex shader. Open the above Codepen source and try it for yourself!

Finally, Applying a Filter

Now we’ve rendered our image in our WebGL canvas (correct side up!), we are ready to transform the pixel data.

All of our transformations will happen within the fragment shader.

Fragment Shader

We’ve added a number of transformation functions adjustBrightness() , adjustContrast() and adjustSaturation() to our shader and call them within our main() routine.

After sampling the texture, we can sequentially adjust the RGB components of our colour using the transformation functions before setting gl_FragColor .

You may have noticed how the adjustSaturation() function uses the mathematical dot() (dot product) function and WebGL mix() function that aren’t available in Javascript. These can absolutely be replicated in Javascript, but the GPU can handle these with ease.

If you wanted to take this example a step further, you could store each of the filter parameters as an attribute, and then set them from our Javascript code. This is exactly what you’d do if you wanted to add UI controls for each of the values.

The Result

Feel free to open the Codepen sample and play with the values. Maybe even try writing your own transform filter!

Note: Whilst there are often simplified alternatives, the mathematics behind many photographic filters are often based on years of technical research and even tuned for human visual perception. To fully understand how these filters work, you may end up reading technical white papers or graduate theses. Fortunately for me who’s pretty bad at maths, I discovered Brad Lawson’s GPUImage library which includes many standard filters in shader format. This vastly helped with my understanding of how they work.

It’s not immediately clear in this example, but the runtime performance of WebGL rendering is far better than looping in Javascript. We can chain many heavy transformations together and still render within our target frame time.

Caveats

As with most things, there are a number of caveats to processing images using WebGL.

WebGL Rendering Blocks the Main Thread

Even though the processing and rendering of our image is handled by the GPU, our Javascript main thread will wait for all drawing to finish before continuing. This means that if we write very inefficient or complicated shaders, we can still slow down the UI and miss our 60fps target.

GPU Hardware / Driver Bugs Are Real

Web browsers do a good job to hide bugs and performance issues in most GPU drivers. Sometimes specific draw modes are unreasonably slow on specific GPUs and our browsers are filled with hacks and workarounds to hide this. WebGL allows you to run your code directly on the GPU where it’s susceptible to all of the quirks you would’ve been otherwise protected against. However, our image processing application shouldn’t run into many of these issues.

Conclusion and Links

Thanks so much for reaching the end of this very long and complicated article! I hope you found it interesting and maybe even inspiring enough that you want to investigate the world of WebGL and image processing even more.

  • How to process image data using regular 2D canvas and how being bound to a single CPU thread severely limits performance.
  • The basics of vertices, shaders, textures and programs.
  • The different coordinate systems used by the browser and WebGL.
  • How to set up a WebGL 3D canvas for 2D image rendering by drawing two triangles and applying a texture.
  • How to transform pixel data within a fragment shader in order to create image filters.

What I’m Working On

All of this effort is being put into the creation of a library I’m calling Iris.

My aim is to create a dependency-free, lightweight image editing solution that can be used across any framework.

As I mentioned previously, I am aiming for feature parity with the editing tools of Google Photos. I still have a lot to learn, but it’s an enjoyable process.

The code for Iris is on GitHub, and I will be releasing the packages on NPM very soon!

If you’re interested in getting involved with the project, please reach out on GitHub

A Note About three.js

You will notice that I didn’t mention any third-party libraries through the duration of this article. Part of my goal was to explore the inner workings of WebGL, and remove the need for third-party dependencies.

If you are working on a full 3D experience however, I would strongly recommend the use of a graphics library such as three.js as low-level graphics programming really doesn’t scale well to large applications.

Links and Resources