Simple Implementation of InceptionV3 for Image Classification using Tensorflow and Keras

Armielyn Obinguar
3 min readMar 11, 2023

--

InceptionV3 is a convolutional neural network architecture developed by Google researchers. It was introduced in 2015 and is a successor to the original Inception architecture (InceptionV1) and InceptionV2. InceptionV3 was designed to be computationally efficient while maintaining high accuracy on image classification tasks.

The InceptionV3 architecture uses a series of convolutional, pooling, and inception modules to extract features from images. Inception modules are blocks of layers that allow the network to learn a variety of features at different scales and resolutions by using filters of different sizes.

InceptionV3 has achieved state-of-the-art results on a variety of computer vision tasks, including image classification, object detection, and visual question answering. It is often used as a base model for transfer learning, where the pre-trained weights are fine-tuned on a new dataset to improve performance on a specific task.

import os
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.inception_v3 import InceptionV3, preprocess_input
import tensorflow_datasets as tfds

This code imports the necessary libraries for the script including os for file operations, numpy for numerical operations, tensorflow for building and training deep learning models, keras for building and training neural networks, image from keras.preprocessing for image processing, InceptionV3 from keras.applications.inception_v3 for pre-trained InceptionV3 model, preprocess_input from keras.applications.inception_v3 for pre-processing input images, and tensorflow_datasets for loading CIFAR-10 dataset.

IMG_SIZE = (299, 299)
NUM_CLASSES = 10
BATCH_SIZE = 32
EPOCHS = 10

These are the constant parameters for the script including the input image size, the number of classes in the dataset, batch size, and the number of epochs for training the model.

model = InceptionV3(weights='imagenet', include_top=False, input_shape=IMG_SIZE + (3,))

This line loads the pre-trained InceptionV3 model with the ImageNet weights and the input image shape of (299, 299, 3).

for layer in model.layers:
layer.trainable = False

This loop freezes all the layers in the pre-trained InceptionV3 model.

x = keras.layers.GlobalAveragePooling2D()(model.output)
x = keras.layers.Dense(1024, activation='relu')(x)
x = keras.layers.Dropout(0.5)(x)
output = keras.layers.Dense(NUM_CLASSES, activation='softmax')(x)

These lines add a classification head on top of the pre-trained InceptionV3 model. The first line adds a global average pooling layer, followed by a dense layer with 1024 units and a ReLU activation function. Then, a dropout layer with a rate of 0.5 is added, and finally, a dense output layer with softmax activation function that outputs the probabilities for each class.

model = keras.models.Model(inputs=model.input, outputs=output)

This line creates the final model that combines the pre-trained InceptionV3 model and the classification head.

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

This line compiles the final model with the Adam optimizer, categorical cross-entropy loss function, and accuracy as the evaluation metric.

train_data, test_data = tfds.load('cifar10', split=['train', 'test'], as_supervised=True)

This line loads the CIFAR-10 dataset from tensorflow_datasets module and splits it into training and testing sets.

def preprocess(image, label):
image = tf.image.resize(image, IMG_SIZE)
image = tf.cast(image, tf.float32)
image = preprocess_input(image)
label = tf.one_hot(label, NUM_CLASSES)
return image, label

train_data = train_data.map(preprocess).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
test_data = test_data.map(preprocess).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)

These lines define a function to preprocess the input images and their labels. The function resizes the images to IMG_SIZE, casts them to float32, applies the `preprocess.

In conclusion, we have seen how to use the InceptionV3 architecture for image classification tasks. We first loaded the pre-trained InceptionV3 model, froze its layers, and added a classification head. We then compiled the model and loaded the CIFAR-10 dataset using TensorFlow Datasets. We preprocessed the data, trained the model, and evaluated its performance.

The InceptionV3 architecture has shown to be highly effective on a variety of computer vision tasks due to its ability to extract features at different scales and resolutions. Its design allows for efficient use of computing resources, making it well-suited for applications on resource-constrained devices. Additionally, the pre-trained weights of InceptionV3 can be fine-tuned on new datasets, which can greatly improve performance on specific tasks. Overall, InceptionV3 is a powerful and versatile tool in the field of computer vision.

--

--

Armielyn Obinguar

Google Developers Lead | AI Communicator | AWS Community Builder