Detecting content boundaries on a HTML5 canvas
The HTML5 canvas is a wonderful invention. It delivers the graphical abilities of a 386 PC to modern computers. Lately I ran into a problem: How to detect the actual size of the contents of the canvas?
Let’s say we have a canvas, and we need to render a text onto it. Sure, it’s a piece of cake:
<canvas id="canvas" style="border: 1px solid black;" />
JavaScript:
const canvas = document.getElementById('canvas');
const ctx = canvas.getContext('2d');
ctx.fillStyle = 'red';
ctx.font = '150px Arial';
ctx.fillText('Hello, world!', 10, 120);
Oh, no! The text is longer than the canvas.
Of course it’s an easy fix. Just add some more width and height to the canvas…
<canvas id="canvas" width="800" height="200" style="border: 1px solid black;" />
Are we happy? We may as well be… if the size of the canvas wouldn’t matter. But as we know, size does!
The actual problem I was facing
I am currently working on an Electron application where a canvas with a fixed size (1920x1080) is constantly on the screen. Its content is being streamed to a remote server as video. The app combines a web camera image with various overlays, so the streamer can display images, videos and text.
When text is being displayed, I can’t just render it onto the main canvas. The canvas is being refreshed with 30 fps, and if I rendered anything on it, it would immediately disappear as the next refresh occurs. Re-rendering the same static text in every refresh cycle would be really stupid. So I render the text on a separate canvas, then copy its contents onto the main one in every single frame.
I also need to center the text on the screen. I can only do so if I know its exact width and height. Therefore I had to come up with a method to measure the width and content of the actual graphics, which is smaller than the canvas itself. Here is an image to illustrate:
This would be a no-brainer in a proper programming language, but JavaScript still isn’t one. JavaScript simply sucks at iterating through huge amounts of data, like millions of pixels. Fortunately, the JavaScript API has some useful methods, which are lightning fast. (Of course! They were written in a proper programming language.) So let’s see if we can make this snappy!
Getting pixel information
In the first step, we’re going to take a snapshot of our canvas, and put it into an array.
imgData = ctx.getImageData(0, 0, canvas.width, canvas.height);
Now we have an object with a rather huge array. It contains all of our pixels. The structure of imgData
is something like this:
{
width: 1000,
height: 200,
colorSpace: 'srgb',
data: [0, 0, 0, 0, 0, 0, ...]
}
The data
array is a sequence of bytes (that’s numbers from 0 to 255 for you, kittens). Each 4 numbers mean a pixel’s R, G, B and alpha values. So data[0]
is the R value of the pixel at 0; 0
, data[1]
is the G value of the same pixel, and so on. As you probably already figured, the array is 4 times longer than as many pixels we have.
We are looking for the X positions of the leftmost and the rightmost pixels, and the Y positions of the topmost and bottommost ones. These four values will tell us the corners of the bounding box.
But before we’d begin shoveling pixels, let’s just throw away 75% of the data. After all, we don’t care about the R, G and B values. If a pixel isn’t empty, then its alpha value is higher than 0. So we only need the alpha values, which is every 4th element of the array.
let counter = 0;
const pixels = Array.from(imgData.data).filter(() => {
if (counter === 3) {
counter = 0;
return true;
}
counter++;
return false;
});
Now we have an array with 200,000 bytes (as our canvas is 1000x200 pixels.)
Finding top and bottom boundaries
The screen is a two-dimensional array, where we can refer to a pixel with X and Y coordinates. But we have an one-dimensional one! Your first thought might be to try to rearrange it into a two-dimensional array. It was my first one too, but I realized that I don’t have to. It’s an extra step, and it slows down the process. It can be done in a single loop much faster.
First of all, let’s find the top and bottom edges of the image.
let top = null;
let bottom = null;for (y = 0; y < pixels.length - canvas.width; y += canvas.width) { const row = pixels.slice(y, y + canvas.width); if (row.some(pixel => pixel > 0)) { if (top === null)
top = y == 0 ? 0 : y / canvas.width; bottom = y / canvas.width;
}
}console.log(`${top} ; ${bottom}`);
What is happening here?
First we define two variables, top
and bottom
. These will contain the upper and lower boundaries. I am printing them to the console in the last line.
Then we start a for
loop. It basically iterates y
down the canvas, row by row. It starts from 0, and steps by the value of the canvas width, until it reaches the last row. The value of y
will always point to the value of the first pixel in each row in the pixels
array.
Confused? Just imagine our pixels
array as if someone picked up each row of pixels, and placed them in a line, one after another. Pretty much as if someone removed the tiles from your bathroom wall, and lined them up on the floor. We are looking at each row along the line. We know how wide is the canvas, or in other words, how many tiles were in a row on the wall, therefore we know how long of a section we must look at every time.
In each cycle, we check if the row of pixels we’re looking at contains something that isn’t 0.
If it does, and so far we haven’t found any other one, then this is the topmost row. We set top
to the current value of y
. To avoid overwriting top
at the next row, we set a simple condition. And since we’re doing a division here, we make sure we won’t try to divide y
when its value is 0.
We’re also looking for the bottommost row. If the current row contains pixels, then this is the bottommost row we found so far. Unlike top
, we update bottom
in every cycle when we find something. So, bottom
will be last updated in the actual bottommost row.
Let’s see what we got! Adding this snippet to the end, we get this…
ctx.beginPath();
ctx.strokeStyle = 'green';
ctx.rect(0, top, canvas.width, bottom - top);
ctx.stroke();
Finding left and right boundaries
In our previous loop, we were already looking at every row of pixels. This can help us to find leftmost and rightmost pixels in each row. Let’s modify our code a little.
let top = null;
let bottom = null;
let left = canvas.width;
let right = 0;for (y = 0; y < pixels.length - canvas.width; y += canvas.width) {const row = pixels.slice(y, y + canvas.width);if (row.some(pixel => pixel > 0)) { if (top === null)
top = y == 0 ? 0 : y / canvas.width; bottom = y / canvas.width;
let leftmost = null;
let rightmost = null;
for (x = 0; x < row.length; x++) {
if (!!row[x]) {
if (leftmost === null)
leftmost = x;
rightmost = x;
}
}
if (leftmost < left) left = leftmost;
if (rightmost > right) right = rightmost;
}
We’re doing almost the same as when we found the top and bottom borders.
First, we have two new variables: left
and right
. They will store the values we found. Their starting values are the extreme highest they can theoretically have: for left
, the rightmost column of pixels, and for right
, the leftmost column, 0.
Inside the block, which triggers when the loop finds a row with pixels, now we have a new inner loop.
We define two local variables, leftmost
and rightmost
. These will catch the values of — you guessed — leftmost and rightmost pixels in each cycle, therefore each pixel row.
The new loop does the same thing as the parent loop did to vertical rows. It iterates past the row of pixels. When it finds the first one, it saves its position into leftmost
. When finds the last one, it goes into rightmost
.
After the loop, we compare the values we found with left
and right
. If the leftmost
value we found is closer to the left edge than the current value of left
, we pass the new value to it. If rightmost
is closer to the right edge than the current value of right
, we pass it.
At the end, we have our bounding box!
ctx.beginPath();
ctx.strokeStyle = 'green';
ctx.strokeWidth = 1;
ctx.rect(left, top, right - left, bottom - top);
ctx.stroke();
The calculation runs in around 55–60 milliseconds in Chrome. Here is the complete source code. Feel free further implementing it!
<html>
<body>
<canvas id="canvas" width="1000" height="200" style="border: 1px solid black;"/>
<script>window.onload = async () => {const canvas = document.getElementById('canvas');
const ctx = canvas.getContext('2d');
ctx.fillStyle = 'red';
ctx.font = '130px Arial';
ctx.fillText('Hello, world!', 10, 120);
ctx.fillRect(0, 0, canvas.width, canvas.height);const startTime = new Date();imgData = ctx.getImageData(0, 0, canvas.width, canvas.height);let counter = 0;
const pixels = Array.from(imgData.data).filter(() => {
if (counter === 3) {
counter = 0;
return true;
}
counter++;
return false;
});// Find top and bottom boundaries
let top = null;
let bottom = null;
let left = canvas.width;
let right = 0;for (y = 0; y < pixels.length - canvas.width; y += canvas.width) {const row = pixels.slice(y, y + canvas.width);if (row.some(pixel => pixel > 0)) {// If we have no top yet, then this is it
if (top === null)
top = y == 0 ? 0 : y / canvas.width;// This is the bottommost row we found so far
bottom = y / canvas.width;
// Find leftmost and rightmost pixels
let leftmost = null;
let rightmost = null;
for (x = 0; x < row.length; x++) {
if (!!row[x]) {
if (leftmost === null)
leftmost = x;
rightmost = x;
}
}
if (leftmost < left) left = leftmost;
if (rightmost > right) right = rightmost;}
}const endTime = new Date();console.log(`Finished in ${endTime.getTime() - startTime.getTime()} milliseconds.`);
console.log(`${left} ; ${top} - ${right} ; ${bottom}`);ctx.beginPath();
ctx.strokeStyle = 'green';
ctx.strokeWidth = 1;
ctx.rect(left, top, right - left, bottom - top);
ctx.stroke();}</script>
</body>
</html>