Introduction to OpenCV with Python

Elvis Ferreira
7 min readApr 9, 2019

--

On this tutorial I am going to be showing solutions to some computer vision beginner level problems using the OpenCV library for Python. The problems here were proposed by my professor for an engineering Digital Image Processing class.

The OpenCV library is the most famous open source computer vision library (http://www.opencv.org/), available for many programming languages. With its immense amount of features we can make digital image modifications such as geometrical transformations, filtering, camera calibration, feature extraction, object detection, etc. The installation and set up used won’t be shown here.

Source

Pixels manipulation

For the first problem, as a “hello world” type of problem, we will be accessing an image and manipulating its pixels.

First import the library.

To load an image, we use the function imread passing as parameters the name of the file to be read and how we want it to be read: 0 - Black and White, 1 - Color, -1 - Unchanged. For this problem we will be reading only as black and white and color. To show an image similarly as reading we call a function passing as parameter the name of the window to be shown and its content, usually the image object we create by calling imread.

So, let’s read an image in black and white, access its pixels, change some and show the result. The images are loaded as a matrix (each element one pixel), if gray-scaled the matrix has two dimensions, with each element representing the intensity of gray. If colored, is a set of three matrices representing the color channels RGB.

Important note: in OpenCV the coordinates of the matrices are different. The horizontal coordinate is Y and the vertical X. Furthermore the upper corner left is where the (0, 0) origin is placed. So the coordinates grow from the upper corner left to the right and bottom directions. To make it easier to understand, hen looping over the coordinates using variables called X and Y we will only change to the OpenCV notation when accessing the position. Example further on this tutorial.

After read, we can do some easy matrix access and change the pixel color as we want. After modifying some, in a way to make a rectangle, to black we show the result.

Original picture used for most part of the tutorials (Lenna)

A very useful command is necessary here so we can make multiple operations on images using the same code. The waitKey waits for the user to press a key to continue on the code (so we can see the result shown calmly). The parameter 0 waits for any key to be pressed to continue on the code. After this processing we then read again the image of Lenna but this time colored and make a red rectangle just as the black we did before.

The pixels access is done differently for the colored images, the array we pass to the pixel represent the color channels in the BGR order (yes, it’s flipped for whatever reason). As output we have a maximum bright red rectangle, created at the same spot.

Finally to destroy the windows created for showing the images, we call the function destroyAllWindows, so nothing is shown after the program stops running.

Now, let’s make a rectangle with the negative of the b&w image. The position of the rectangle is said by the user, and the negative effect is created subtracting the pixel from maximum possible value for the pixel (8 bit:255 here).

Running with the following coordinates 100, 200, 30, and 150 we obtain

Exploring even further pixel manipulation let’s reorder the four quadrants of the image inverted. A quadrant represents 1/4 of the image in a square format, as if the picture was cut in half vertically and horizontally.

For this challenge we are going to be using the numpy library to simplify the manipulation with matrix. The numpy function split splits the matrix in a given position and a given direction. After splitting the image we have to concatenate the parts in flipped order.

Filling regions

In computer vision, one very usual task is counting objects in a detected scene. To perceive an object is necessary to detect the aggregation of pixels belonging to each object. To work better upon this, we are going to be using a binary picture (only gray-scaled pixels 0 and 255) meaning 0 the black background and 1 an object pixel.

Binary gray-scaled picture simulating objects

Here we assume that each aggregation of white pixels is a single object. So, a possible thing to do is labeling. Usually labeling algorithm has a binary image as input and returns a multi scaled gray image. For this purpose we are going to be using a built-in algorithm for filling regions called floodfill. This algorithm essentially searches, using a seed pixel as reference, for neighbors with the same color. If we give a color as parameter the function make all the pixels found by the search have that color.

The floodFill from opencv asks for the image, a mask (not used = None), the seed pixel and the color to color the seed and the similar neighbors. With that in hands we only need to go pixel by pixel checking its color, if it is white we change to some gray color, and then increment the gray color so the next object will have a different gray ton. As this process goes we can count how many times the floodFill was applied and assume that’s the number of objects we have in the scene.

As a result we can’t see some objects because the gray scaled used was the same variable used to count the number of objects, so making it a very dark gray .

Floodfill algorithm applied to a binary picture

Since we are using gray-scale to count the number of objects, we get limited by the value of 255 objects in a scene for a 8 bit processing. You can think of different ways of solving this problem, I’d suggest creating another variable that keeps track when the number of elements reaches 256 (starting from 0). When that happens this variable would be incremented and the nelem zeroed to start over. By the end of the labeling we would have objects with the same gray color but the variable that kept track together with nelem could do the math of how many more objects wore detected after the limit.

Now seeing the original binary picture again we can realize that there are two different type of objects, with a hole and without it. How could we count them separate?

First of all we don’t know if the ones cut by the border have the hole or not, so we ignore them (somehow). Having all the objects in scene, one way to solve this is applying floodFill in the background of the image to be white 255 (in case the scene had more than 255 objects we would have to edit the code to protect the max white from be used) so now we have the objects gray-scaled, the background white and the holes black (old background), since flood filling the background to white won’t hit the holes.

With that in hands we solved to problem, having the holes black it’s just search for black pixels through the whole picture and for each black pixel found we increment the number of holes, and then flood fill it so we don’t get mistaken with the summation. For this example we do it with the same color of the background so we have the hole feeling.

Output for couting holes

That’s it, on this first part I tried to show some basics for understanding how a digital image is composed and how we can access its pixels and do some some analysis. I hope it was clear enough :)

Thank you for reading, to read more go:

2. Introduction to OpenCV with Python part II

3. Lighting exposure improvement using Fourier Transform with OpenCV

4. Let’s play with image borders on OpenCV

5. Color quantization with Kmeans in OpenCV

All the codes can be found at my public repository at GitHub.

--

--