Image Processing with WebGL
This is the December 9 article for Eureka Advent Calendar 2021.
Native applications have set a high standard for image editing and processing features that users have come to know and love. With features ranging from simple cropping, to photo filters and decoration.
With the exception of a few standout services, the standard for image editing on the web is unfortunately still pretty poor. In even some of the most popular applications and social networks, tools are either very limited or non-existent. Users are expected to revert to native counterparts to process and crop images before uploading to a web service.
As a keen photographer and web engineer, I have made it my challenge to create a portable image cropping and filtering library which is able to achieve native-like performance entirely on the client.
This has been a long and interesting journey over the course of a year that taught me a lot about 2D canvas rendering all the way down to low-level use of WebGL and shaders.
If any of this sounds interesting to you, read on and I’ll show you everything I wish I knew a year ago!
Throughout this article, we will be dealing with images via the
ImageData interface. It’s perhaps the simplest and most understandable way of representing an image, whilst also being very easy to manipulate.
I’ll skip the exact details of how to get an image as
ImageData, but the simplest way involves creating an
<img> tag to fetch image data, drawing that to a canvas and calling the
getImageData() method on the canvas context.
ImageData interface stores the image’s width, height and a
UInt8ClampedArray of pixel data in a repeating RGBA sequence where each channel is represented as a number between 0 and 255. If that sounds familiar, that’s because it is — you’re probably used to writing colours as hexadecimal, where
#ffffff means an RGB value of
255, 255, 255.
For all of the examples given in this article, I will be ignoring the alpha channel as I’m assuming our images are fully opaque. Also note that the pixel data in the
ImageData interface is ordered top-left to bottom-right, matching the coordinate system of the browser . This will become a key piece of information later on.
Image processing with HTML Canvas
Before delving into WebGL rendering, I think it’s worth briefly covering 2D canvas. The 2D canvas APIs are well established and widely supported. I’d also used it a number of times before so this was a good starting point for my research.
To process an image based on a set of input parameters (e.g. brightness, contrast, saturation etc.), we need to take the red, green and blue channels of each pixel and perform a transformation which applies the given input parameters. The simplest example of a transform increases the brightness of our input image:
Using this pattern, a number of different filters can be created. An exposure filter multiplies each of the channels by an exponent. The simplest of contrast filters is a combination of the above two addition and multiplication examples.
But after chaining multiple transformations together, the code becomes heavy and definitely won’t run at a smooth frame rate in the browser. One way in which we could optimise this would be to represent each transformation as an identity matrix. If you’ve ever written filters in SVG, you may be familiar with the
<feColorMatrix> element which changes colours based on a transformation matrix. The neat thing about identity matrices is that they can be mathematically combined into a single transformation, therefore our loop would only have to run once.
The problem arises when we want to apply more complex filters that can’t be represented as simple transformations. My goal for this project is to replicate all of the filters available in the Google Photos app, where adjusting the color temperature, tint, highlights and shadows requires calculating dot products and converting between colour spaces.
Performing all of these transformations across even a small image quickly blows our 10ms frame time budget.
This leads us on to…
Image processing with WebGL
In order to perform complex transformations whilst maintaining good performance, we’re going to hand off the work to the GPU.
WebGL is an almost 1:1 wrapper for the OpenGL ES 2.0 API which contains all of the power, and all of the quirks and misdirection that comes from an API that was originally written in the early 1990s. WebGL is quite unlike any of the standard object oriented browser APIs you’re familiar with.
Note: WebGL 2 — being based on OpenGL ES 3.0 — includes a number of additional texture formats, and includes a few browser-specific additions. Everything that follows is compatible with both WebGL and WebGL 2.
In the following section, I will be using terminology that is often used in 3D rendering and the use of WebGL. Below are some of these terms which — in some cases — are overly-simplified, but accurate enough to the context of image processing.
GPUs draw shapes from a list of points or vertices which form a triangle. Our list of vertices can be represented in either 2D or 3D space. Any surface that is drawn in WebGL must be made of triangles, regardless of the shape. This will be important later when we come to draw a square on the canvas; we will actually be drawing two triangles.
A texture is essentially an image file that has been uploaded to the GPU. An uploaded texture can then be read or sampled by a shader when drawing pixels on the screen. There are many more complexities and settings that textures can utilise such as mipmapping, but we will skip over these as they are not important to our work today.
Shaders are small pieces of code that tell the GPU how to draw each pixel. The two types of shader available in WebGL are vertex shaders and fragment shaders. These are written using GLSL (Graphics Library Shader Language), which you may find very similar to the C language.
A vertex shader takes each of our vertices which are represented in either 2D (x, y) or 3D (x, y, z) space and maps them to our 2D canvas.
These are important in 3D applications where your vertex shader is used to ‘project’ the position of each vertex onto your 2D monitor through use of a projection matrix, creating depth and perspective. This is why games still look 3D when viewed on a 2D monitor.
It’s easiest to imagine that a vertex shader takes a shape and decides which pixels on the screen should be filled. As our image processing library is only dealing with 2D data, ours will be incredibly simple.
A fragment shader (sometimes known as a pixel shader) takes each pixel of our shape and decides what colour it should be. A simple fragment shader would return the same colour for each pixel, colouring the entire surface a solid colour. With this knowledge, you may be able to deduce that our fragment shader will have to sample pixels from a texture to draw our image.
A program in WebGL is a combination of both a vertex shader and a fragment shader that when enabled, tells the GPU how to draw vertices.
This program exposes a series of locations (think memory locations) to which we can pass an
ArrayBuffer of vertices, or any other parameter that our compiled shaders will use.
Note: Scene rendering is not strictly limited to a single program; in some cases you may wish to run multiple programs sequentially to build up a scene.
With our simple knowledge of vertices, textures, shaders, and programs we are ready to start using WebGL!
Drawing a Triangle
For graphics programming, drawing a triangle is the equivalent of “Hello World!”.
It requires some shader programming and a fair bit of control code. There’s quite a few parts to it, but we’ll tackle it step by step.
In order to draw a triangle on the screen, we must first create a vertex shader to draw our vertices in screen space. Screen space in WebGL is represented as a floating point number from
attribute vec2 position; Defines an attribute on our shader/program. Think of it like a parameter that will receive the 2D coordinate of each of our triangle’s vertices.
main() routine runs once per vertex and sets
gl_Position to our x and y coordinate (feel free to ignore the
0, 1.0 at the end as they are not used for simple 2D drawing).
main() routine of our fragment shader sets
gl_FragColor to this lovely shade of magenta for each pixel of the triangle. The colour is represented as a
vec4 as it contains all four RGBA channels.
You’ll also notice the
precision highp float; definition at the top of the file. This tells our shader to run floating point calculations with high precision.
WebGL Control Code
There’s quite a lot going on here, so let’s break it down:
After creating a WebGL context, we compile our two shaders and attach them to a program. We then tell our WebGL context to use this as our active program.
We create a new
Float32Array containing the 3 vertices of our triangle. This follows the format
[x1, y1, x2, y2, x3, y3] . This array is then passed to the GPU via
bindBuffer() and setting
bufferData() . There are a number of parameters here, but I will skip over the details for brevity.
Next we call
getAttribLocation() on our active program to find the location of our
position attribute we defined in our vertex shader. When our shaders are compiled and linked to a program, our attributes are assigned a location from which we can access them.
With our newly fetched location, we can use our buffer as vertex data using
enableVertexAttribArray() and tell WebGL to interpret it as 2D vectors using
Finally, our WebGL context has received all of the data it needs to draw the scene. So we clear the canvas, clear our
COLOR_BUFFER_BIT and call
drawArrays() , specifying that
3 vertices should be drawn as triangles, from a starting point of
Drawing a Square
Drawing a triangle in WebGL is great, but you may remember that we’re trying to build a photo editing application.
I don’t know about you, but I’ve not personally seen any triangular photos before!
To draw a square, we will need to draw two triangles. To do so, we will add another triangle to our
VERTICES array and tell WebGL to draw
6 vertices instead of
Before we try to map a texture to our square, let’s first make some modifications to our shaders.
To each of our shaders we will add a
varying vec2 texCoords; . Think of this like a program variable that our vertex shader will write to and then our fragment shader will read from.
If you remember from earlier, WebGL’s screen coordinates are represented as a floating point number from
1.0. To make our lives easier, we first convert this range to
1.0 before setting it as the value of our
Instead of returning a solid magenta colour, our fragment shader now uses the
x component as our red channel,
y component as our blue channel, and finally our green channel as
Try reading the fragment shader in detail and imagine how the strength of the red component increases as the
x value increases. Likewise, the green component increases as the
y value increases.
Now we’ve successfully created a square in our scene, we’re ready to show an image.
ImageData as a Texture
ImageData to our WebGL context as a texture.
In short, we define a 2D texture in slot
TEXTURE0 , set the texture data to our image, using the colour mode
Following this, we can set a number of parameters such as wrapping mode, and interpolation mode (
MAG_FILTER). Wrapping mode isn’t important for our use case, but you may wish to experiment with the different interpolation modes.
We create a
sampler2D which will be used to sample colour data from the texture we previously set as the active texture in slot
As you remember, our fragment shader’s
main() routine runs once for each pixel of the output canvas. So for each pixel, we use the
texture2D function to sample the texture at our current coordinate and set this as our
Because WebGL coordinates are bottom-left to top-right, our image data is sampled with the y-axis inverted!
This is fortunately a simple fix, by inverting
texCoords.y in our vertex shader. Open the above Codepen source and try it for yourself!
Finally, Applying a Filter
Now we’ve rendered our image in our WebGL canvas (correct side up!), we are ready to transform the pixel data.
All of our transformations will happen within the fragment shader.
We’ve added a number of transformation functions
adjustSaturation() to our shader and call them within our
After sampling the texture, we can sequentially adjust the RGB components of our colour using the transformation functions before setting
You may have noticed how the
adjustSaturation() function uses the mathematical
dot() (dot product) function and WebGL
If you wanted to take this example a step further, you could store each of the filter parameters as an
Feel free to open the Codepen sample and play with the values. Maybe even try writing your own transform filter!
Note: Whilst there are often simplified alternatives, the mathematics behind many photographic filters are often based on years of technical research and even tuned for human visual perception. To fully understand how these filters work, you may end up reading technical white papers or graduate theses. Fortunately for me who’s pretty bad at maths, I discovered Brad Lawson’s GPUImage library which includes many standard filters in shader format. This vastly helped with my understanding of how they work.
As with most things, there are a number of caveats to processing images using WebGL.
WebGL Rendering Blocks the Main Thread
GPU Hardware / Driver Bugs Are Real
Web browsers do a good job to hide bugs and performance issues in most GPU drivers. Sometimes specific draw modes are unreasonably slow on specific GPUs and our browsers are filled with hacks and workarounds to hide this. WebGL allows you to run your code directly on the GPU where it’s susceptible to all of the quirks you would’ve been otherwise protected against. However, our image processing application shouldn’t run into many of these issues.
Conclusion and Links
Thanks so much for reaching the end of this very long and complicated article! I hope you found it interesting and maybe even inspiring enough that you want to investigate the world of WebGL and image processing even more.
- How to process image data using regular 2D canvas and how being bound to a single CPU thread severely limits performance.
- The basics of vertices, shaders, textures and programs.
- The different coordinate systems used by the browser and WebGL.
- How to set up a WebGL 3D canvas for 2D image rendering by drawing two triangles and applying a texture.
- How to transform pixel data within a fragment shader in order to create image filters.
What I’m Working On
All of this effort is being put into the creation of a library I’m calling Iris.
My aim is to create a dependency-free, lightweight image editing solution that can be used across any framework.
As I mentioned previously, I am aiming for feature parity with the editing tools of Google Photos. I still have a lot to learn, but it’s an enjoyable process.
The code for Iris is on GitHub, and I will be releasing the packages on NPM very soon!
If you’re interested in getting involved with the project, please reach out on GitHub
A Note About three.js
You will notice that I didn’t mention any third-party libraries through the duration of this article. Part of my goal was to explore the inner workings of WebGL, and remove the need for third-party dependencies.
If you are working on a full 3D experience however, I would strongly recommend the use of a graphics library such as three.js as low-level graphics programming really doesn’t scale well to large applications.