Step-by-Step Tutorial: Image Classification with Keras
an end-to-end guide implementing image classification in Keras
Image classification is a fundamental task in computer vision that involves assigning an image to a pre-defined category or class. Keras is a widely used deep-learning library that offers extensive support for image classification tasks.
In this blog post, we present a comprehensive guide to performing image classification using the Keras library. We will employ the CIFAR10 dataset, a popular benchmark in image classification, to show the capabilities of Keras for this task.
Contents:
- Load and visualize dataset
- Preprocessing steps
- Build a custom model
- Training and evaluating our model
1. Load and visualize dataset
For analyzing image classification models, the CIFAR10 dataset is frequently utilized. It consists of 60,000 32x32 color images that are divided into 10 categories (airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks).
Implement the following code to load this dataset:
import tensorflow as tfp
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()
print(f"X_train: {X_train.shape}") # 50000, 32, 32, 3
print(f"y_train: {y_train.shape}") # 50000, 1
print(f"X_test: {X_test.shape}") # 10000, 32, 32, 3
print(f"y_test: {y_test.shape}") # 10000, 1
We now display random dataset samples:
lbls = ['Airplane', 'Car', 'Bird', 'Cat', 'Deer', 'Dog',
'Frog', 'Horse', 'Ship', 'Truck']
fig, axes = plt.subplots(5,5, figsize = (10,10))
axes = axes.ravel()
for i in np.arange(0, 5*5):
idx = np.random.randint(0, len(X_train))
axes[i].imshow(X_train[idx,1:])
lbl_idx = int(y_train[idx])
axes[i].set_title(lbls[lbl_idx], fontsize = 8)
axes[i].axis('off')
plt.subplots_adjust(hspace=0.4)
After that, we examine the data distribution to determine if it’s well-balanced or not.
classes, counts = np.unique(y_train, return_counts=True)
plt.barh(lbls, counts)
plt.title('Class distribution in training set')
As you can see, each class has around 5000 examples, which means we’re looking at a pretty balanced dataset!
2. Preprocessing steps
The goal of image preprocessing is to prepare images and labels in order to enhance the neural network’s ability to extract features from the images. The steps are normalization, one hot encoding, splitting the dataset (train, validation, test), and data augmentation.
2.1 Data normalization
We scale the pixel values of the images to a standard range, such as between 0 and 1, to make the images more uniform and easier for the neural network to analyze.
X_train = X_train / 255.0
X_test = X_test / 255.0
2.2 One-hot encoding
One hot encoding is a technique used to represent categorical data, such as class labels, in a format that can be easily processed by neural networks. In one hot encoding, each category is assigned a unique binary code, where each bit represents the presence or absence of that category.
y_train_cat = tf.keras.utils.to_categorical(y_train, 10)
y_test_cat = tf.keras.utils.to_categorical(y_test, 10)
2.3 Train | Validation
Here, we split the training dataset into training and validation sets.
X_TRAIN, X_VAL, Y_TRAIN, Y_VAL = train_test_split(X_train,
y_train_cat,
test_size=0.2,
random_state=42)
2.4 Data Augmentation
We augment only our train set here.
batch_size = 64
data_generator = ImageDataGenerator(horizontal_flip=True)
train_generator = data_generator.flow(X_TRAIN, Y_TRAIN, batch_size)
3. Build Model
Here, we will implement our custom CNN model using CNN, max-pooling, batch normalization, and dropout layers.
INPUT_SHAPE = (32, 32, 3)
model = Sequential()
model.add(Conv2D(filters=16, kernel_size=(3, 3), input_shape=INPUT_SHAPE, activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(filters=16, kernel_size=(3, 3), input_shape=INPUT_SHAPE, activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(filters=32, kernel_size=(3, 3), input_shape=INPUT_SHAPE, activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(filters=32, kernel_size=(3, 3), input_shape=INPUT_SHAPE, activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.summary()
4. Train model
First, we determine the model’s parameters. We select Adam as optimizer and categorical cross-entropy as loss function.
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics='accuracy'
)
Then we start training...
history = model.fit(train_generator,
epochs=10,
validation_data=(X_VAL, Y_VAL),
)
5. Model Evaluation
plt.figure(figsize=(12, 16))
plt.subplot(4, 2, 1)
plt.plot(history.history['loss'], label='Loss')
plt.plot(history.history['val_loss'], label='val_Loss')
plt.title('Loss')
plt.legend()
plt.subplot(4, 2, 2)
plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label='val_accuracy')
plt.title('Accuracy')
plt.legend()
Now, we display a confusion matrix:
y_pred = model.predict(X_test)
y_pred = np.argmax(y_pred, axis=1)
cm = confusion_matrix(y_test, y_pred)
con = ConfusionMatrixDisplay(confusion_matrix=cm,
display_labels=lbls)
fig, ax = plt.subplots(figsize=(10, 10))
con = con.plot(xticks_rotation='vertical', ax=ax,cmap='summer')
plt.show()
and predict a random input from the test set:
import random
idx = random.randint(0, len(X_test))
im = X_test[idx]
plt.imshow(im)
pred_t = np.argmax(model.predict(im.reshape(1, 32, 32, 3)))
print(f"our model predicts that image {idx} is {lbls[pred_t]}")
Conclusion
In this blog post, we have provided a step-by-step guide on how to load, preprocess, and train the CIFAR10 dataset using a basic CNN architecture. Additionally, we have explained how to make predictions on images that have not been seen before.
but there’s still much room for improvement. Our current model is not ideal; to enhance its performance, you can tune hyperparameters, adjust the number of layers or neurons, or even utilize a pre-trained model that performs better.