What is the Computer’s understanding of Image Data?🤔

2D Image Digital Representation

Since I am doing more computer vision practices than any other topics of machine learning, it would be good if I will recap of how computer sees the actual objects . We will have a look to most popular image dataset — MNIST. This is where computer vision tasks, practice starts.

First we include required libraries :

Actually, every popular Machine Learning library has its own “built-in” datasets like MNIST handwritten digits. This is pretty nice chance to see the flow from how image is used in machine learning project up to how it is feeded to the model.

Now, let’s have a look into how I import built-in MNIST dataset :

And then , you must to split your data into segments like training and test sets. Actually you dont have to that mnist.load_data() will do it for you.

Let’s have an example how image looks like in arrays using format options of NumPy.

As you see , image literally is converted to arrays and it seems that we have number “5” here. Since its a grayscale image, every number in array represents how dark that particular picel is.

Now, lets have a look what these numbers look like. For that I use Matplotlib. For the second image(train_image[1]) in our train_images we have number “0” :

Tensors — multidimensial arrays

Multidimensional arrays are also called tensors. The math vocabulary can be mildly overwhelming, but we’re going to show you that it’s a lot simpler than you might think. Shape of a tensor is really the number of dimensions, or, in terms of arrays, the number of different indices that you would use to access them.The most basic tensor you can imagine is a one tensor, which, in programming language is just called an array. It is just an ordered series of individual numbers packed together. Next up is a two tensor. If you look at the Grey scale image (arrays of array) screenshot, each row is of one dimension, and each column is another dimension.

Again, it’s just an array of arrays. And you can see that there are trained images with the bracket zero; we’re actually picking out the first image in an array of images. So, a three tensor, preceding with image data, is actually an array of images, each with an array of columns and rows of pixels. So, a three tensor is our basic way of storing black and white images.

You can see in the image at index one, the Xs and the Ys (the coordinates that are shown following on the digit) are simply the dimensions of the tensor

The see the shape you can call .shape of NumPy array:

Turning images into tensors

This really means is that you take your data (in this case, it’s numbers on the range of 0 to 255) and then divide it by another number so that you squash down the range from 0 to 1:

This is needed for numerical stability in machine learning algorithms. They simply do better, converge faster, and become more accurate when your data is normalized on the range of 0 to 1.

Turning categories into tensors

For this, we’re going to use a data transformation trick. This thing is called one-hot encoding, and it is where you take an array of label possibilities, in this case, the numbers 0 through 9, and turn them into a kind of bitmap, where each option is encoded as a column, and only one column is set to 1 (hence one-hot) for each given data sample:

Now, looking at both an input digit (here, 9), and the output bitmap, where you can see that the forth index has the ninth bit set, you can see that what we’re doing in our data preparation here is having one image as an input and another image as an output. They just happen to be encoded as tensors (multidimensional arrays of floating point numbers):

What we’re going to be doing when we create a machine learning algorithm is have the computer learn or discover a function that transforms the one image (the digit nine) into another image (the bitmap with one bit set on the ninth column), and that is what we mean by machine learning. Remember that tensors are just multidimensional arrays, and that the x and y values are just pixels. We normalize these values, which means we get them from the range of zero to one so that they are useful in machine learning algorithms. The labels, or output classes, are just an array of values that we’re going to map, and we’re going to encode these with one-hot encoding, which again means only one is hot, or set to one.

Thank you for reading! Please hit clap 👏🏻 if you enjoyed reading. Let’s connect on Twitter!🐦

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store