Step-by-Step Tutorial: Image Classification with Keras

4 min readMay 2, 2023

an end-to-end guide implementing image classification in Keras

Image classification is a fundamental task in computer vision that involves assigning an image to a pre-defined category or class. Keras is a widely used deep-learning library that offers extensive support for image classification tasks.

In this blog post, we present a comprehensive guide to performing image classification using the Keras library. We will employ the CIFAR10 dataset, a popular benchmark in image classification, to show the capabilities of Keras for this task.

1. Load and visualize dataset

For analyzing image classification models, the CIFAR10 dataset is frequently utilized. It consists of 60,000 32x32 color images that are divided into 10 categories (airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks).
Implement the following code to load this dataset:

import tensorflow as tfp
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()

print(f"X_train: {X_train.shape}") # 50000, 32, 32, 3
print(f"y_train: {y_train.shape}") # 50000, 1
print(f"X_test: {X_test.shape}")   # 10000, 32, 32, 3
print(f"y_test: {y_test.shape}")   # 10000, 1

We now display random dataset samples:

lbls = ['Airplane', 'Car', 'Bird', 'Cat', 'Deer', 'Dog',
                'Frog', 'Horse', 'Ship', 'Truck']

fig, axes = plt.subplots(5,5, figsize = (10,10))
axes = axes.ravel() 
for i in np.arange(0, 5*5): 
    idx = np.random.randint(0, len(X_train))
    axes[i].imshow(X_train[idx,1:])
    lbl_idx = int(y_train[idx])
    axes[i].set_title(lbls[lbl_idx], fontsize = 8)
    axes[i].axis('off')

plt.subplots_adjust(hspace=0.4)

After that, we examine the data distribution to determine if it’s well-balanced or not.

classes, counts = np.unique(y_train, return_counts=True)
plt.barh(lbls, counts)
plt.title('Class distribution in training set')

As you can see, each class has around 5000 examples, which means we’re looking at a pretty balanced dataset!

2. Preprocessing steps

The goal of image preprocessing is to prepare images and labels in order to enhance the neural network’s ability to extract features from the images. The steps are normalization, one hot encoding, splitting the dataset (train, validation, test), and data augmentation.

2.1 Data normalization

We scale the pixel values of the images to a standard range, such as between 0 and 1, to make the images more uniform and easier for the neural network to analyze.

X_train = X_train / 255.0
X_test = X_test / 255.0

2.2 One-hot encoding

One hot encoding is a technique used to represent categorical data, such as class labels, in a format that can be easily processed by neural networks. In one hot encoding, each category is assigned a unique binary code, where each bit represents the presence or absence of that category.

y_train_cat = tf.keras.utils.to_categorical(y_train, 10)
y_test_cat = tf.keras.utils.to_categorical(y_test, 10)

2.3 Train | Validation

Here, we split the training dataset into training and validation sets.

X_TRAIN, X_VAL, Y_TRAIN, Y_VAL = train_test_split(X_train, 
                                                y_train_cat, 
                                                test_size=0.2, 
                                                random_state=42)

2.4 Data Augmentation

We augment only our train set here.

batch_size = 64
data_generator = ImageDataGenerator(horizontal_flip=True)

train_generator = data_generator.flow(X_TRAIN, Y_TRAIN, batch_size)

3. Build Model

Here, we will implement our custom CNN model using CNN, max-pooling, batch normalization, and dropout layers.

INPUT_SHAPE = (32, 32, 3)

model = Sequential()
model.add(Conv2D(filters=16, kernel_size=(3, 3), input_shape=INPUT_SHAPE, activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(filters=16, kernel_size=(3, 3), input_shape=INPUT_SHAPE, activation='relu', padding='same'))
model.add(BatchNormalization())

model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(filters=32, kernel_size=(3, 3), input_shape=INPUT_SHAPE, activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(filters=32, kernel_size=(3, 3), input_shape=INPUT_SHAPE, activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPool2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dense(10, activation='softmax'))

model.summary()

4. Train model

First, we determine the model’s parameters. We select Adam as optimizer and categorical cross-entropy as loss function.

model.compile(loss='categorical_crossentropy', 
               optimizer='adam', 
               metrics='accuracy'
              )

Then we start training...

history = model.fit(train_generator, 
                    epochs=10,
                    validation_data=(X_VAL, Y_VAL), 
                    )

5. Model Evaluation

plt.figure(figsize=(12, 16))

plt.subplot(4, 2, 1)
plt.plot(history.history['loss'], label='Loss')
plt.plot(history.history['val_loss'], label='val_Loss')
plt.title('Loss')
plt.legend()

plt.subplot(4, 2, 2)
plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label='val_accuracy')
plt.title('Accuracy')
plt.legend()

Now, we display a confusion matrix:

y_pred = model.predict(X_test)
y_pred = np.argmax(y_pred, axis=1)
cm = confusion_matrix(y_test, y_pred)

con = ConfusionMatrixDisplay(confusion_matrix=cm,
                              display_labels=lbls)

fig, ax = plt.subplots(figsize=(10, 10))
con = con.plot(xticks_rotation='vertical', ax=ax,cmap='summer')

plt.show()

and predict a random input from the test set:

import random
idx = random.randint(0, len(X_test))
im = X_test[idx]
plt.imshow(im)

pred_t = np.argmax(model.predict(im.reshape(1, 32, 32, 3)))
print(f"our model predicts that image {idx} is {lbls[pred_t]}")

Conclusion

In this blog post, we have provided a step-by-step guide on how to load, preprocess, and train the CIFAR10 dataset using a basic CNN architecture. Additionally, we have explained how to make predictions on images that have not been seen before.

but there’s still much room for improvement. Our current model is not ideal; to enhance its performance, you can tune hyperparameters, adjust the number of layers or neurons, or even utilize a pre-trained model that performs better.