MLearning.ai
Published in

MLearning.ai

DEEP LEARNING TUTORIAL

Image Classification with TensorFlow

A practical example of how image classification is performed.

Photo by Josh Rose on Unsplash

Computer vision, which is used in many fields such as robotics, health, unmanned aerial vehicles, driverless cars, sports, and entertainment, is one of the important research areas in machine learning.

In this article, I’ll cover the following topics:

  • What is computer vision?
  • Loading dataset
  • Data preprocessing
  • Model building
  • Evaluating the model
  • Predicting the new image

Let’s dive in!

What is Computer Vision?

Computer vision is a branch of artificial intelligence that deals with how computers can be made to gain high-level understanding from digital images or videos. It also covers methods for the reconstruction of a 3D scene from multiple images and video sequences, and in the medical field, it covers the segmentation of organs, tumor detection, and micro-calcification detection in mammograms.

Let’s take a look at some examples of applications of computer vision are:

  • Automated Inspection of products
  • Security and surveillance
  • Optimization of manufacturing processes
  • Autonomous Vehicles
  • Biometrics
  • Content-based image retrieval
  • Medical image analysis
  • Robot vision.

Loading Dataset

Fashion MNIST dataset

What is the Fashion MNIST dataset?

The dataset I’m going to use is the Fashion MNIST dataset. This dataset contains images of fashion items instead of images of digits, unlike the classical MNIST dataset. The classes in the Fashion MNIST dataset have more varieties than in the classic MNIST dataset. Therefore, it is more difficult to classify the images in this dataset than in the classical MNIST dataset.

The Fashion MNIST dataset consists of 70,000 grayscale images and 10 categories. 60,000 images are used to train the network and 10,000 images to evaluate how accurately the network learned to classify images. The images show fashion items measuring 28 x 28 pixels. Analyzing using the Fashion MNIST dataset is the “Hello World” of computer vision. This dataset is relatively small in size, making it easy to build and test a computer vision model. You can directly load this dataset with TensorFlow.

import tensorflow as tf
fashion_mnist = tf.keras.datasets.fashion_mnist
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

When the dataset is loaded, four NumPy arrays are returned. The X_train and y_train arrays are used to train the model. The X_test and y_test arrays are used to test the model. Pixels in images are integers from 0 to 255. Let’s look at the shape and data type of the training and test set.

X_train.shape, X_test.shape

Data Preprocessing

Data preprocessing is one of the important steps of data analysis. The data must be preprocessed before training the network. The labels in the dataset consist of numbers. Let’s assign the names of fashion items corresponding to these numbers to a variable.

class_names = ["T-shirt / top", "Trouser", "Pullover", "Dress",
"Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]

Let’s use the matplotlib library to see the second image.

import matplotlib.pyplot as plt
plt.figure()
plt.imshow(X_train[1])
plt.colorbar()
plt.grid(False)
plt.show()

You can see that the pixel values ​​are between 0 and 255.

Normalizing the Dataset

Let’s scale the inputs to increase the training speed and performance of the model. You can do this by simply dividing the pixels of the entries into the data set by 255.

X_train = X_train / 255.0
X_test = X_test / 255.0

Model Building

To build a neural network, it is necessary to configure the layers of the model and then you can compile the model. Let’s now adjust the layers of the model using the Sequential API.

The basic block of a neural network is the layer. Layers extract representations from the data. You hope these representations make sense for the problem you are dealing with. Most deep learning models are formed by a chain linking of layers. Let’s start building the model.

model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape = (28, 28), name = "Input"),
tf.keras.layers.Dense(128, activation='relu', name = "Hidden"),
tf.keras.layers.Dense(10, name = "Output")
])

Let’s go through these codes I wrote line by line.

(1) The first line creates a Sequential model. The sequential model is Keras’ simplest model. In the sequential model, the layers are sequenced in order.

(2) In the next line, I wrote the Flatten layer and added it to the model. Layer from flat converts input images of 28 x 28 pixels into 1 dimensional array (28 * 28 = 784). This layer does not take any parameters, only reshapes the format of the data.

(3) In the next row, I added a hidden Dense layer of 128 neurons to the model. I used the ReLU activation function in this layer. The dense layer connects with all neurons in the previous layer. Each Dense layer has its weight matrix and this layer contains all the weights between input and output.

(4) Finally, I added a Dense layer with 10 neurons, one neuron per class. The last layer returns a logit array of 10 lengths. Each neuron contains a score indicating that the image belongs to one of 10 classes. The summary method shows all layers of the model with the names of the layers.

model.summary()

Note that if you do not name the layer, the layer name is automatically generated by Keras. The meaning of None in the output means that batch size can be anything. The total number of parameters is shown at the end of the summary. You can easily get the layer list with the index or name of a model.

model.layers
hidden = model.layers[1]
print(hidden.name)

All parameters of a layer can be accessed with the get_weights and the set_weights methods. Let’s look at both the weights and the bias of the first layer.

weights, biases = hidden.get_weights()
print(weights)
print (biases)
weights.shape, biases.shape

Notice that the weights of the first Dense layer are random and the bias is initialized with zero. You can also initialize the weights and bias in the layer using the kernel_initializer and bias_initializer methods, respectively. More information about these methods can be found here.

Compiling the Model

Before starting the training of the model, it is necessary to compile the model with the compile method. When compiling, the loss function and optimizer are determined. Optionally, an extra metric can be used to see the calculation during training and evaluation.

The loss function measures how accurately the model predicts during training. We want to minimize this function in order to direct the model in the right direction. The optimizer updates the model based on the loss function and the data it sees. The metrics is used to monitor training and testing steps.

model.compile(loss = tf.keras.losses.SparseCategoricalCrossentropy(
from_logits = True),
optimizer = 'adam',
metrics = ['accuracy'])

Let’s go through these codes.

(1) I used SparseCategoricalCrossentropy as a loss function because there are labels from 0 to 9. If you code the labels with one-hot coding, you can use the CategoricalCrossentropy loss function.

(2) I used adam as an optimizer, popular in recent years.

(3) Since the problem we are dealing with is classification, I used accuracy as the metric.

Training the Model

The model learns the relationship between images and labels during training. Now we can train the model by calling the fit method.

history = model.fit( X_train, y_train, 
epochs = 10,
validation_split = 0.1)

Training sets are used while training the model. The model is evaluated with the validation data. You can separate some of the data for validation with the validation_split argument. By typing 0.1 in this argument, I wanted 10 percent of the training data to be used for validation.

While training the model, loss and accuracy metrics were shown at the end of each epoch. Monitoring these metrics is useful for seeing the actual performance of the model. If the accuracy of the model in the training set is better than the validation set, there may be an overfitting problem.

As you can see, the loss value decreases in each epoch. This means that the model learns from data. After 10 periods, the training and verification accuracies were written on the screen.

The fit method returns a History object containing training parameters. history.history is in the form of a dictionary. This dictionary includes metric and loss measured after each epoch in training and validation sets. If you convert this dictionary structure to a Pandas DataFrame structure and use the plot method, you can plot the training curve.

import pandas as pd
pd.DataFrame(history.history).plot(figsize = (8, 5))
plt.grid(True)
plt.show()
Accuracy and loss for train and validation sets

As you can see from the graph, the accuracy of the model increases in training and validation data, while the loss on training and validation decreases.

If the model is not performing well, you can tune the hyperparameters. The first parameter you should check is the learning rate. If changing this parameter doesn’t work, you can choose a different optimizer. If the model’s performance still does not improve then you can change the number of layers, the number of neurons in each layer, and the activation function in the hidden layers. You can also set the argument the batch_size , which is the default 32 in the fit method.

Evaluating the Model

I built the model with the training data. You might want to see how the model predicts data it hasn’t seen before. To evaluate the model, a test set not used during training is used.

Let’s call evaluate method and evaluate the model by using the test set.

test_loss, test_acc = model.evaluate (X_test, y_test, verbose = 2)
print ('\ nTest accuracy:', test_acc)

The accuracy of the model on the test set is slightly less than on the training data. This difference between training and testing accuracies indicates the overfitting problem. This problem shows that the model is memorizing. In other words, while the model predicts the training data well, it cannot predict the data that it has not seen before. The regularization technique I used L1, L2, or the last lesson can be used to overcome this problem.

Making a Prediction

You may want to predict new images with the model you trained. The linear output of the model is logit. You can convert logits into possibilities by adding a softmax layer for easier interpretation.

probability_model = tf.keras.Sequential (
[model, tf.keras.layers.Softmax()])

Let’s estimate the test data based on this model.

predictions = probability_model.predict(X_test)

So the model predicted the label of each picture on the test set. Let’s take a first prediction.

predictions [0]

Note that 10 possibilities corresponding to each fashion item were returned. You can see the label with the highest probability using the argmax method in NumPy.

import numpy as np
np.argmax(predictions[0])

The model predicted the first image as the ankle boot. Let’s take a look at the actual label of the first image.

y_test[0]

As you can see, the model made the correct prediction.

That’s it. In blog post, I explained the following topics for image classification using the MNIST dataset:

  • What is computer vision?
  • Building the model
  • Tuning hyperparameters
  • Evaluating the model
  • Predicting the new image

I hope you enjoyed the post. Thank you for reading my article. You can access the notebook I used in this article on GitHub or Kaggle pages.

That’s it. Thank you for reading. I hope you enjoy it. Don’t forget to follow us on YouTube | GitHub | Twitter | Kaggle | Linkedin 👍

If this post was helpful, please click the clap 👏 button below a few times to show me your support 👇

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store