Crash course on how Numpy Arrays, PyTorch Tensors, PIL, Colab & Computer Images work!
In PyTorch, arrays can be also be written as Tensors, which is basically the same thing as an array in Python, only that Tensors can run on a GPU to increase speeds by over 100,000 times. This increase in computational speed is critical for machine learning programs to function efficiently in the practical world.
A black and white computer image is a “2 dimensional array” or “tensor of rank 2” of numbers, with each number between 0 and 255. In other words, a tensor of rank 2, means that it is an array of 2 “dimensions” (also known as “axes”) with the ability to run on a GPU.
See the diagram below for an example of a black and white computer image from the MNIST image dataset:
In the black and white image above, 0 represents the pixel as white, 255 represents the pixel as black, and 1–254 represents the pixel as a shade of gray with the higher the number, the darker the gray.
Next, if you are working with computer vision in Pytorch, you will be using PIL (Python Image Library) functions frequently. Let us use “crop” for an example to learn from because it is a commonly used and essential function in computer vision machine learning.
Below is some code taken from a Colab notebook from FastAI.
Let’s go step by step. First link your Colab notebook to your Google drive, so you can get images to manipulate and process with PIL (Python Image Library) functions.
Before you crop this picture, you need to import PIL. Once you have imported PIL, you can try a basic PIL function call PILImage.create to get a picture. Then you can use the “show()” function to display the picture to make sure you indeed got an image.
Before you crop the image, it is a good idea to get the size of the image first so you have a reference point on where to start cropping your image. Here is the image size function from PIL:
In this case, our image of a car is 178 high and 202 wide.
Next, let’s say we want to crop from the upper left hand corner and get rid of the white space to the right and bottom of the picture, we can use another PIL function called “crop(left, up, right, bottom) to do this:
In this example the function crops to (0, 0, 100, 140). The first two zeros refer to the left and up respectively, this tells the crop to keep the image starting from the upper left-hand corner. And how much of the picture should we keep? Well the 100, 140 tells the computer to keep the image 100 to the right and 140 down. See below for a comparison between the original image and the cropped image.
Hopefully, this basic example of a PIL function will peak your interest and get your started in learning more about the Python Image Library.
Did you find this article helpful? All questions, comments, and discussions are welcome! By the way, if there are any coders out there who are environmentally conscious and who are also passionate about machine learning in the computer vision space, please reach out to me. We are developing a computer vision project to be used with drones to clean up the environment in China!