First Steps to the OpenCV-Python

Güldeniz Bektaş
Geek Culture
Published in
10 min readJul 1, 2021

You can go to my Github account to find some entry level projects. I share them with their sources, so you can check them out, and find more projects for yourself to learn 🌈

In my last article, I mentioned briefly about computer vision. All idea behind computer vision is what computers can tell from a digital video or an image. It is a field that aims to automate tasks that the human vision can do with. Computer vision operations require some methods, like processing, analyzing, and extraction of the images. We cannot feed the model with direct images obviously. As you know, computers only understand numbers and in order to train the model, we must convert the pictures to matrices or tensors. We can also make changes in the images to make the operations easier.

🤔 What is OpenCV library?

OpenCV-Python is a library of Python bindings designed to solve computer vision problems.

OpenCV supports a wide variety of programming languages like Python, C++, Java, etc. It can process images and videos to identify objects, faces, or even the handwriting of a human.

In this article I’ll try to give you beginner friendly information about OpenCV’s image preprocess functions. We will cut, transform, rotate and change the colors of pictures etc. Let’s dive in 🚀

1. Import OpenCV

import cv2 as cv

2. Reading an image

Before whatever we want to do to our image, we first need to read our image.

img = cv.imread("bojack-horseman.png") # image I choose

As far as I know, the extension of the image file can be jpeg or png. I haven’t try with any other extension actually, but if you want to, you can Google it any time!

📌 One important thing, every time we’re done with our code, we should add this two line of codes to the end of the file:

cv.waitKey(0)
cv.destroyAllWindows()

cv.waitKey(0) will display the window infinitely until any keypress. You can shut the image window by pressing one random key on your keyboard, like ‘q’ or escape key. If you put any other number in waitKey, it will wait that long milliseconds, and shut down itself.

cv.destroyAllWindows() simply destroys all the windows we created.

We read our image, now we should display it like this:

cv.imshow("Image", img)

Final version of the code:

import cv2 as cvimg = cv.imread("bojack-horseman.png")cv.imshow("Image", img) # first argument is the window's namecv.waitKey(0)
cv.destroyAllWindows()

Output:

⭐️ If you want to read an image in a grayscale mode, just add a zero.

img = cv.imread("bojack-horseman.png", 0)

3. Draw shapes & write text in an image

Create an black image with NumPy:

import numpy as npblank = np.zeros((500, 500, 3), dtype = "uint8")# display the blank image first
cv.imshow("Blank", blank)

Let’s print the image in green:

blank[:] = 0, 255, 0 # green
cv.imshow("Green", blank)

Output:

Draw a rectangle:

cv.rectangle(img, (0,0), (250,250), (0, 250, 0), thickness = cv.FILLED)
cv.imshow("Rectangle", blank)

Output:

First argument is the image we want to draw rectangle on, second argument is the starting point of the rectangle, third argument is the en point, forth argument is the color, pass it as a tuple, fifth argument is the thickness of the rectangle’s line, in here we pass it as cv.FILLED, it means fill the shape, you can pass -1 to for the same purpose.

Draw a circle:

cv.circle(blank, (250, 250), 40, (0, 0, 255), thickness = 3)

First image we want to draw on, second, circle’s center coordinates, thirth its radius, and forth its color, fifth its thickness, we don’t want to fill the shape this time.

Draw a line:

cv.line(blank, (0, 0), (250, 250), (255, 250, 255), thickness = 3)

First image we want to draw on, second line’s starting coordinates, third line’s ending coordinates, and forth its color, fifth its thickness.

Write text on an image:

cv.putText(blank, "Geronimo", (0, 255), cv.FONT_HERSHEY_TRIPLEX, 1.0, (0, 255, 0), 2, cv.LINE_AA)

First argument is the same, second argument is the position of the image, third argument is the font size, you can choose yourself in here, forth argument is the font scale, fifth is the color, sixth is the thickness, and for better look cv.LINE_AA is recommended by OpenCV.

Doctor Who Fan ♥️

4. Geometric Transformations of Images

OpenCV has two transformation functions; cv.warpAffine, cv.warpPerspective. While cv.warpAffine function takes 2x3 matrix as input, cv.wrapPerspective function takes 3x3 matrix as input.

Resizing the image:

It is simply resizing the image. Function is pretty forward too, cv.resize(). You can specify the new sizes or you can specify a scaling factor.

height, width = img.shape[:2]
resized = cv.resize(img, (2*width, 2*height), cv.INTER_LINEAR)
cv.imshow("resized", resized)

Third argument is the interpolation method.

Image interpolation occurs when you resize or distort your image from one pixel grid to another. Zooming refers to increase the quantity of pixels, so that when you zoom an image, you will see more detail. Interpolation works by using known data to estimate values at unknown points.

There are different type of interpolation methods in OpenCV library. Preferable interpolation methods are cv.INTER_AREA for shrinking and cv.INTER_CUBIC (slow) & cv.INTER_LINEAR for zooming. By default, the interpolation method cv.INTER_LINEAR is used for all resizing purposes.

Transformation

You can shift an object’s location.

You can make the transformation values of x, and y directions to an NumPy array, and pass it into the cv.wrapAffine() function. Below example is for (100, 50) shift:

rows, cols = img.shape
M = np.float32([[1,0,100], [0,1,50]]) # marked
trans = cv.warpAffine(img, M, (cols,rows)) # 3rd is the size of the # output image
cv.imshow("Transformated", trans)

Transformation matrix is defined as (reason of marked code line):

📍The third argument of the cv.warpAffine() function is the size of the output image, which should be in the form of (width, height). Remember, width is equal to number of columns, and height is equal to number of rows.

Rotation

OpenCV provides scaled rotation with adjustable center of rotation so that you can rotate at any location you prefer. The modified transformation matrix is given by:

where:

α=scale⋅cosθ,

β=scale⋅sinθ.

To find this transformation matrix, OpenCV provides a function, cv.getRotationMatrix2D. Below example rotates the image by 90 degree with respect to center without any scaling:

M = cv.getRotationMatrix2D(((cols-1)/2.0, (rows-1)/2.0), 90, 1)
rotated = cv.warpAffine(img, M, (cols, rows))
cv.imshow("Rotated", rotated)

5. Image Thresholding

For every pixel, the same threshold value is applied. If the pixel value is smaller than the threshold, it is set to 0, otherwise it is set to a maximum value. The function cv.threshold is used to apply the thresholding. The first argument is the source image, which should be a grayscale image. The second argument is the threshold value which is used to classify the pixel values. The third argument is the maximum value which is assigned to pixel values exceeding the threshold. OpenCV provides different types of thresholding which is given by the fourth parameter of the function. Basic thresholding as described above is done by using the type cv.THRESH_BINARY.

The method returns two outputs. The first is the threshold that was used and the second output is the thresholded image.

# binarizing the image, 0 --> black, above (255) --> white
ref, thresh = cv.threshold(gray, 125, 255, cv.THRESH_BINARY)
cv.imshow("thresh", thresh)

6. Blurring

Image blurring is achieved by convolving the image with a low-pass filter kernel. It is useful for removing noise. It actually removes high frequency content from the image. So edges are blurred a little bit in this operation (there are also blurring techniques which don’t blur the edges).

Gaussian Blurring

Blurring an image by a Gaussian function. Gaussian filter takes the neighborhood around the pixel and finds its Gaussian weighted average.

We should specify width, and height of the kernel. Third argument given zero means standard deviation in the x, and y directions, xSigma, and ySigma will be calculated form the kernel size. Or you can specify it.

blur = cv.GaussianBlur(img, (5,5), cv.BORDER_DEFAULT)

Bilateral Filtering

In most of the blurring technique blurs the edges too but bilateral filtering removes noises while keeping the edges sharp at the same time. The con is this operation is slower comparing to the other blurring techniques.

blur = cv.bilateralFilter(img,9,75,75)

7. Canny Edge Detection

Edge detection has more than one stage, and I’ll go through with each one of them.

Step 1:

First, we should remove noise in the image with 5x5 Gaussian filter (Gaussian Blur) since this operation is susceptible to noise.

Step 2:

Then, with smoothened image filtered with Sobel kernel in both horizontal, and vertical direction to get first derivative in horizontal direction (Gx), and vertical direction (Gy). And with these two images we can find edge gradient and direction for each pixel.

📍Gradient direction is always perpendicular to edges. It is rounded to one of four angles representing vertical, horizontal and two diagonal directions.

Step 3:

In this step, every pixel is checked if it is a local maximum in its neighborhood in the direction of gradient.

Like the image above, A point is on edge, B, and C points are on gradient direction. A point is checked if its whether its local maximum of its neighborhood or not. If it is considered for next stage, if it is not it is supressed to zero.

Step 4:

The last step is to decide if selected edges are really edges or not. For this, we need a threshold value, minVal, and maxVal. Edges with intensity gradient greater than maxVal considered as sure-edge, smaller than minVal considered as non-edge. The ones which lie between these two line classified based on their connectivity with sure-edges, and non-edges.

A point is above the maxVal so it is classified as sure-edge.

B point is in between, and has no connection with sure-edge, it is classified as non-edge.

C point is in between, and has a connection with sure-edge A point, it is classified as sure-edge.

canny = cv.Canny(img, 125, 175)

First argument is our input image. Second and third arguments are our minVal and maxVal respectively. Third argument is aperture_size. It is the size of Sobel kernel used for find image gradients (by default it is 3).

8. Histograms

Histogram gives you an overall idea about intensity distribution. In x-axis there are pixel values ranging from 0 to 255 (not mandatory), in y-axis there are corresponding number of pixel values in the image.

Cambridge in color
gray_hist = cv.calcHist([img], [0], None, [256], [0, 256])
plt.figure()
plt.title("Grayscale Histogram")plt.xlabel("Bins")plt.ylabel("# of pixels")plt.plot(gray_hist)plt.xlim([0, 256])plt.show()

First argument is the image we want to see its histogram, should be given in square brackets. Second argument is channels, should given in a square brackets, we gave 0 because our image is in gray scale. You can change it according to your image. Third argument is mask image. If you want to see the distribution of full image give it None like I did, but if you want to find a particular region of an image you have to create a mask image. Forth argument is histSize, this represent our BIN count. Fifth argument is ranges. I gave the normal range.

--

--