Classifying Images of Clothing Using TensorFlow

Published in

Analytics Vidhya

7 min readApr 12, 2021

How to train a Deep Learning model to classify images of clothing using Convolutional Neural Networks in TensorFlow.

Deep Learning is a subfield of machine learning that uses multi-layered neural networks to extracts patterns from data. My objective within this project is to present how to apply Deep Learning concepts to an image classification problem. For this, we are going to train a Convolutional Neural Network (CNN) to classify a dataset of clothing using the TensorFlow library in Python.

In the last few years, Convolutional Neural Networks has been achieving superhuman performance on some complex visual tasks. They power image search services, self-driving cars, automatic video classification systems, and more¹.

In case you are interested in the source code for this project, please check it out on my GitHub.

Exploring the Dataset

We are going to use the Fashion MNIST dataset, which contains 70,000 greyscale images in 10 categories. The images represents individual clothing with 28 x 28 pixels of resolution.

Each image has a single class in the range [0, 9]. We can see in the table below the classes in the dataset.

Classes from Fashion MNIST dataset by Author.

An image is just a matrix of numbers, in our problem a 28 x 28 matrix. Each value is in the range [0, 255], which defines the color and intensity of each pixel.

Here we can see the first 5 rows from the first sample of our training dataset.

Example of part of an image array by Author.

Now, let’s plot this single image.

Example of image from Ankle boot class by Author.

We can display a few examples from our train dataset with the respective class. Plotting the first 24 samples, it is possible to see at least one example of each class

Example of images with respective class by Author.

Preprocessing the Data

Normalizing

The value of each pixel in the image, an integer in the range [0, 255], needs to be normalized for the model to work properly. We can create a function that divides each value by 255.0. When applying this function to our dataset, we will get normalized values in the range [0,1].

Reshaping the Image

The Convolutional Neural Network expects 4 dimensions as input: number of samples (60,000), pixels (28 x 28), and color channel. Since we are working with greyscale images there is only a single channel. However, as we have seen before, the shape of our X_train dataset is (6000, 28, 28) and we need (6000, 28, 28, 1) as input. Thus, we need to reshape our datasets.

One-Hot Encoding

Our class data has labels in the range [0, 9], which is called Integer Encoding. However, there is no ordinal relationship between the labels and the corresponding class.

In this case, using the integer encoding allows the model to assume a natural ordering between categories, which may result in poor performance or unexpected results from the Deep Learning model. To solve this problem, we can use a one-hot encode, which creates a new binary variable for each unique integer value².

Now, let’s check how are our label data. Each label changed from a single value to a vector with value “1” in the respective position.

Class label array before and after applying One-Hot Encoding by Author.

Creating the Model

Building the Layers

The first step in the model creation is to define the layers of our network. The CNN has at least one convolutional layer and also includes other types of layers, such as pooling layers and fully connected layers (dense). For this project, we are going to use a typical CNN architecture represented in the image below.

*Typical CNN Architecture by* Géron, Aurélien (2019).

As we have in the image, we will include a convolutional and a pooling layers, then another convolutional and pooling layers. Then, we are going to add a flatten layer to transform our 2d-array image in a 1d-array and add some dense layers. We can add some dropout layers to reduce overfitting. For the last layer, we add a dense layer with the number of classes from our problem (10) and a softmax activation, which creates the probability distribution for each class.

model = keras.models.Sequential([  keras.layers.Conv2D(filters=64, kernel_size=3, activation='relu',                      padding='same', input_shape=[28, 28, 1]),  keras.layers.MaxPool2D(pool_size=2),  keras.layers.Conv2D(filters=128, kernel_size=3, activation='relu',                      padding='same'),  keras.layers.MaxPool2D(pool_size=2),  keras.layers.Flatten(),  keras.layers.Dense(units=128, activation='relu'),  keras.layers.Dropout(0.25),  keras.layers.Dense(units=64, activation='relu'),  keras.layers.Dropout(0.25),  keras.layers.Dense(units=10, activation='softmax'),])

Compiling the Model

The next step is to compile the model. Here we pass the optimizer, which adjusts the weights to minimize the loss, the loss function, which measures the disparity between the true and predicted values, and the metrics, a function used to measure the performance of the model.

model.compile(optimizer='adam', loss='categorical_crossentropy',         metrics=['accuracy'])

Training the Model

The last step is to train our model. Here we need to pass the input data, the target data and the number of epochs, which defines the number of full iterations of the training dataset. We will also pass a parameter to split our data in training (70%) and validation (30%) and a parameter to define the batch_size, which is the number of training examples in each pass.

We will save the results of our training in the variable model_history.

model_history = model.fit(X_train, y_train, batch_size=50, epochs=10, validation_split=0.3)

Evaluating the Loss

The fit method returns a history object with the results for each epoch. We can plot a chart with the loss and accuracy for the training and validation datasets. From this chart, it is possible to see how the loss goes down and the accuracy goes up over the epochs. This chart is also used to identify evidence of overfitting and underfitting. For our model, it doesn’t seem we have strong evidence of these problems. Thus, let’s move on and make some predictions with our test dataset.

Accuracy and loss plots for training and validation datasets by Author.

Making Predictions and Evaluating the Results

Evaluating the Accuracy in the Test Dataset

Now, let’s see how the model performs with our test dataset.

Model accuracy and loss for test dataset by Author.

The accuracy on the training dataset is smaller than the accuracy on the test and validation datasets. However, this is still a good result with 91,55% of accuracy.

Making Predictions

We can use our model to predict a class for each example in our test database.

Now, let’s plot a few images from our test dataset with the true and the predicted labels. When the model predicts right, the text will be displayed in blue, if the prediction is wrong, it will be displayed in red. Also, it will be displayed the calculated probability for the predicted class.

Example of images with true classes and predicted classes by Author.

From these 20 examples, we can see that our model made wrong predictions for one coat and one shirt. However, it is not feasible to analyze the predictions for 10,000 examples using this plot. Thus, let’s plot a crosstab to analyze what our model predicted right and wrong for each class.

Crosstab

Analyzing our crosstab, we can notice that our best accuracy was achieved for the Sandals classification (99,1%), while our lowest accuracy was for the Shirts (76,9%). The crosstab provides a great way to visualize the quantities predicted by our model for each class. We can easily see, for example, that it predicted Shirt as T-shirt/Top for 74 examples or that it didn’t predict any Bag as Ankle boot.

Classification Report

Now, let’s plot a summary with the precision, recall and, f1-score for each class using the classification report from the scikit-learn library. Here it is possible to see that we didn’t have a considerate disparity between precision and recall for any classes. As we noticed in the crosstab, our worst result is for the Shirt class.

Conclusions

In this project, it was presented how to train a Convolutional Neural Network to classify images of clothing from the Fashion MNIST dataset using TensorFlow and Keras. Using this model, we got an overall accuracy of 91,55% in our test dataset, which is a good result. However, specifically for our Shirt class we got an accuracy of only 76,90%. We could try to improve the accuracy of this class using some data augmentation techniques. Furthermore, in case you want to get a model with higher accuracy, you could try changing some hyperparameters or using different network architectures.

Should you have any questions or feedbacks about this project, feel free to contact me on LinkedIn.

References

[1]: Géron, Aurélien (2019). Hands-On Machine Learning with Scikit-Learn, Keras and TensorFlow (2nd ed.). Sebastopol, CA: O’Reilly Media, Inc.

[2]: Brownlee, Jason (2020, June 30). Why One-Hot Encode Data in Machine Learning?. Retrieved from https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/