Computer Vision for Busy Developers
This article is part of a series introducing developers to Computer Vision. Check out other articles in this series.
Getting Started with Images
Before we dive deeper into computer vision, let’s first go over some basics of Images and simple image processing. So, what’s a digital image? To simplify, an image is an array of numerical values called pixels. Images are generally organized in two dimensions with the array starting at the top-left where the index is 0 and flows from left-to-right and top-to-bottom. In a color image, each pixel represents a color made up of three distinct numerical values. These distinct numerical values represent the intensity of Red, Green and Blue light. The combination of these three intensity values gives each pixel its color. Often times, we want to focus on a specific color channel, which is all of the values of a single intensity in isolation. In any individual pixel, an intensity of 0 indicates no light in that channel for that pixel and an intensity of 255 indicates the maximum light in the channel for that pixel (assuming each channel is represented by one byte, resulting in three bytes per pixel).
SIDE NOTE: In academic papers, digital images are often represented by a mathematical function. The function takes in an X and a Y coordinate and returns a single brightness value. Color images are represented by a combination of three similar functions for each of the color channels.
While color images are usually represented by three channels, grayscale (black and white) images are only represented by one channel. In the case of a grayscale image, for any pixel, an intensity of 0 indicates white while an intensity of 255 indicates black (assuming each pixel is represented by one byte). We can imagine that working with one channel compared to three has advantages in memory usage and processing time. In the next article, we go over in more detail the importance of using grayscale in CV . For similar reasons, working with 4-Channel such as those which include transparency (or other information) is generally not useful in CV unless it’s relevant to a specific use-case.
SIDE NOTE: As mentioned, we’re going to be using RGB “Color Space” to discuss pixels. There are some systems that will use other color spaces such as YCbCr or HSL to represent pixels numerically.
The sample image of the toucan we’ve been using is 1,365 pixels wide by 2,048 pixels tall for a total of 2,795,520 pixels(1,365 * 2,048). Each pixel is represented by three bytes, one for each Red, Green and Blue channels (3 bytes total or 24 bit — 3 * 8). The data structure of the image could look something like the following:
[[162, 189, 218],
[165, 192, 221],
[170, 199, 229],
[165, 194, 224],
[169, 198, 230],
[165, 194, 226],
[170, 199, 233],
[063, 075, 009],
[067, 079, 013]]
SIDE NOTE: In these articles, we are discussing pixels being represented by numerical values stored in bytes. Lots of systems use bytes to store pixel data and the maximum size of 255 makes it convenient to discuss. Bytes are not the only way to store pixel data. Unity, for instance, can represent image data in various formats, including various floating point numbers to store pixel data.
Now, let’s say we wanted to print and display this image within a photo frame. We can decide on what color photo frame to use based on the image itself. One approach would be to pick a frame based on the average color of the entire image. In order to determine what the average color of the entire image is, we would do just that: calculate the average of all of the colors in the pixel array that represents the image. Add up all of the values of each channel and divide them by the total number of pixels. It is important not to mix color channels — for example, do not sum the red channel with the green channel.
It turns out that the average color of the entire image is
[R:100,G:112,B:063]. If we were to pick a frame based on this color, we would end up with the following image.
Let’s be a little bit more creative with our image processing. Instead of processing the image and getting one specific color result, let’s process the image and replace the colors of each the pixels. This type of image processing is referred to as “point operator” or “point process”. If we were to take each of the pixels of the image and multiply each of the numerical values by 1.5 while not going over the maximum value of 255 (maximum value for a byte), the image will be lightened. If we were to divide each pixel by a value of 1.5, then the image would be darkened. Altering each pixel value by using a function is often referred to as an image Filter — a concept familiar with anyone who has used Photoshop in the past! Doing some simple multiplication on an image can have some interesting and useful results.
To lighten or darken an image, we multiply or divide from the original pixel value by a constant. Another interesting example when replacing the pixels of the image is to subtract a constant value from the pixel itself. If we were to start with a white value of 255 and then subtract the original pixel value, we will see a “negative” version of the image. This one might be a little bit less useful, but I think it really drives home the point that an image is just a big list of numbers and some simple math can have some really fun results.
Converting Color to Grayscale
Earlier, we determined the average color of the entire image. We’ve also modified individual pixels by a constant amount. What do you think will happen if we average each individual pixel? Let’s go ahead and average the three intensity values of each pixel and replace all three values with the average.
Here’s a hint: we already know that each color channel represents the intensity of light for red, blue or green. We also discussed that grayscale images only have one channel. Let’s check out the results in the image below.
The process of converting RGB images into grayscale is an incredibly important step in computer vision. Not only does it reduce the amount of data that we are working with, it focuses on the intensity of the light in an image, not the color of the light in an image. Most of the more advanced computer vision algorithms discussed in this series will be starting with a grayscale image.
SIDE NOTE: Because of the way that our eyes perceive colors, systems that are converting colors to grayscale for human consumption, often will use a Luminosity approach instead of directly averaging the values. The driver here is that human eyes are more sensitive to green so the “Luminosity” approach emphasizes green when converting to grayscale.
Digital Images are just a bunch of numbers — usually organized in an array. The data in the arrays have a very specific definition and when you manipulate the individual values within the array, it results in a very specific image output. Developers know how to manipulate numbers, therefore, developers already know how to manipulate images!
Sources and More Info
- Three Algorithms for Converting Color to Grayscale — Blog Post by John Cook
- MATLAB as a Tool in Nuclear Medicine Image Processing — Academic Paper by Maria Lyra, et al.
- RGB Color Model — Wikipedia
- Unity Texture Formats — Unity Documentation