Getting started with TensorFlow 2.0
A practitioner’s guide to building and deploying an image classifier in TensorFlow 2.0
Google released its newest version of TensorFlow machine learning framework which brings in major improvements in the way we use TensorFlow. Having one of the largest developer communities, TensorFlow has come a long way starting from just a machine learning library to a full-fledged machine learning ecosystem.
All the new additions in TensorFlow 2.0 and their tutorials are available on the YouTube channel and their revamped website. However, today in this tutorial we'll be covering an end to end pipeline of building and deploying an image classifier in TF 2.0. This tutorial will help in putting together some of the new additions in this latest release. Specifically, we'll cover:
- Downloading and preprocessing data using TensorFlow Datasets
- Building and training an image classifier model using Keras high-level API
- Downloading and fine-tuning InceptionV3 Convolutional Neural Network
- Serving the trained model using TensorFlow Serving
All of the code in this tutorial is available, in Jupyter notebook, in my GitHub repository. It is recommended to open the Jupyter notebook in a separate window alongside this tutorial.
Before we get started we need to install the TF nightly preview, which contains the TensorFlow 2.0 release, using the following command:
$ pip install -U --pre tensorflow
1. Downloading and pre-processing data using TensorFlow Datasets
TensorFlow Datasets provides a collection of datasets ready to use with TensorFlow. It handles downloading and preparing the data and constructing a tf.data.Dataset
. Learn more on how to load an image dataset using tf.Data
here. We start by installing TensorFlow Datasets python package via pip:
$ pip install tfds-nightly
Downloading datasets
There are a variety of datasets available and its also possible for you to add your own dataset by following the guidelines here. To list the available datasets, execute the following python code:
import tensorflow_datasets as tfds
print(tfds.list_builders())
Before downloading any dataset it’s recommended to know few details like features and statistics of the dataset. For example, in this tutorial, we are going to download the tf_flowers
dataset so, we go to the TensorFlow Datasets webpage and find the tf_flowers
dataset. Over there, we get the following:
- Total downloadable size of the dataset,
- The data-type/object which would be returned (by
tfds.load()
) and, - Whether the dataset already has any standard splits like train, validation, and test.
The tf_flowers
dataset is of 218MB, gives us a FeaturesDict object and doesn’t have any splits. Since tf_flowers
doesn’t define any standard splits, we use the subsplit feature to divide it into train, validation, and test with 80%, 10%, 10% of the data respectively. We use tfds.load()
function to download the dataset. Specifying as_supervised=True
downloads the dataset having a 2-tuple structure (input, label)
instead of FeaturesDict
. Passing with_info=True
to tfds.load()
, gives us the metadata of the downloaded dataset. Heres the python code:
Pre-processing the dataset
The images in the downloaded dataset can have different dimensions. We need to resize all the images to a given height and width and normalize the pixels values to a range between 0 and 1. We do this because in order to train a convolutional neural network we have to specify the input dimensions. The shape of our final dense layer depends on the input dimensions of a CNN. We define a function format_exmaple()
and pass it to the map function of the raw_train
, raw_validation
and raw_test
objects. The arguments of the format_example()
depend on the parameters passed to tfds.load()
. Specifically, if as_supervised=True
then (image, labels)
tuple pair will be downloaded else a single dictionary with keys image
and label
will be passed.
We also shuffle the dataset, by calling .shuffle(BUFFER_SIZE)
on train
object, so that we don't have any ordering bias of examples. Setting a shuffle buffer size as large as the dataset ensures that the data is completely shuffled. We then create batches of size 32 by calling .batch(BATCH_SIZE)
on train
, validation
and test
sets. Using .prefetch()
lets us fetch batches of the dataset in the background while the model is training.
There are a few things to note here:
- The order is important. A
.shuffle()
before a.repeat()
would shuffle items across epoch boundaries (some items will have seen twice before others are seen at all). A.shuffle()
after a.batch()
would shuffle the order of the batches, but not shuffle the items across batches. - We can use a
buffer_size
with the same size as the dataset for a full shuffle. Large values provide better randomization but use more memory. - The shuffle buffer is filled before any elements are pulled from it. So a large
buffer_size
may cause a delay when yourDataset
is starting. - The shuffled dataset doesn’t report the end of a dataset until the shuffle-buffer is completely empty. The
Dataset
is restarted by.repeat()
, causing another wait for the shuffle-buffer to be filled.
This last point can be addressed by using the tf.data.Dataset.apply()
method with the fused tf.data.experimental.shuffle_and_repeat()
function:
ds = image_label_ds.apply( tf.data.experimental.shuffle_and_repeat(buffer_size=image_count))
ds = ds.batch(BATCH_SIZE)
ds = ds.prefetch(buffer_size=AUTOTUNE)
Performing data augmentation
Data augmentation is an important technique in training robust deep learning models. It prevents over-fitting and helps the model in understanding the unique features of the classes in the dataset. For eg., if we want our model to learn to differentiate between sunflowers and tulips, learning only the color of the flower might not be sufficient. We would want our model to learn the shape and relative size of the petals, presence or absence of disk florets, etc. So, if we want to prevent the model to use color as its primary distinguishing parameter, we can use black and white photos or change the brightness parameters. To avoid orientation bias we can randomly rotate images in our dataset and so on.
It would be very useful to have these data augmentations applied to our dataset in real time while training, rather than manually creating and adding these images to our dataset. We use the same map function to apply different augmentations:
Visualizing the dataset
One of the ways to unearth anomalies/bias in image datasets is by visualizing some random samples of it. It also tells us how varied/similar the images, of a particular class, are in the given dataset. Fetching the dataset is really simple. We can fetch the dataset batch by batch using train.take()
method and convert it to numpy array or we can use tfds.as_numpy(train)
instead of train.take()
to directly get numpy arrays.
The above code snippet produces the following output:
2. Building a simple CNN using tf.keras
tf.keras
is TensorFlow’s implementation of the Keras API specification. This is a high-level API to build and train models that include first-class support for TensorFlow-specific functionality, such as eager execution and tf.data
pipelines. tf.keras
makes TensorFlow easier to use without sacrificing flexibility and performance.
The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D
and MaxPooling2D
layers. As input, a CNN takes tensors of shape (image_height, image_width, color_channels)
, ignoring the batch size. Grayscale images have one color channels, whereas colored images have three (R, G, B). For our dataset, we will configure our CNN to process inputs of shape (128, 128, 3). We do this by passing the argument shape
to our first layer.
To complete our model, we will feed the last output tensor from the convolutional base (of shape (28, 28, 64)) into one or more Dense layers to perform classification. Dense layers take vectors as input (which are 1D), while the current output is a 3D tensor. So first, we will flatten (or unroll) the 3D output to 1D, then add one or more Dense layers on top. Our dataset has 5 classes, we get that value from the metadata of our downloaded dataset. Hence, we add a final Dense layer with 5 outputs and a softmax activation.
The above model was created using Keras’s Functional API. However, another way to create models in Keras is by using Keras’s Model Subclassing API which follows an object-oriented structure to build the models and define it’s forward pass, check out the “Twitter micro-course” by the author of Keras himself.
Compiling and training the model
In Keras, compiling a model simply configures it for training i.e. it sets the optimizers, loss functions, and metrics to be used during training. To train a model for a given number of epochs (iterations of the dataset), we call the .fit()
function on the model
object. We can directly pass the train
and validation
objects, to the .fit()
function, by calling .repeat()
on them so that the training keeps looping over the dataset for the specified number of epochs. Before we call the .fit()
we need to calculate a few parameters to be passed to it:
# Calculating number of images in train, val and test sets
num_train, num_val, num_test = (
metadata.splits['train'].num_examples * weight/10
for weight in SPLIT_WEIGHTS
)steps_per_epoch = round(num_train)//BATCH_SIZE
validation_steps = round(num_val)//BATCH_SIZE
Here, since our downloaded dataset doesn’t define any standard splits we use our subsplit ratio of 8:1:1 to calculate the number of examples in our train, validation, and test splits.
steps_per_epoch
: It defines the number of batches on which we train our model in one epoch. It's calculated by dividing the number of training examples by the size of each batch.validation_steps
: It is the same assteps_per_epoch
but applies to the validation dataset.
Visualizing training metrics
We plot the training and validation metrics returned by the train_model()
or manually_train_model()
routine. We use Matplotlib to plot the graph:
These graphs give us insights into how well our model has trained. It is necessary to ensure that both the training and validation accuracies increase and losses decrease.
- If the training accuracy is high but the validation accuracy is low, then it’s a typical case of overfitting. You may have to increase your training dataset by performing data augmentation or downloading more images from the internet. You can also try out other model architectures which include regularisation techniques like Dropout and BatchNormalisation.
- If, on the other hand, your training accuracy and validation accuracy both are higher but, your validation accuracy is slightly higher then maybe your validation dataset comprises of ideal (easy to classify) images of the given classes. Sometimes using techniques like dropout and BatchNorm add randomness in training, making training more difficult, and hence the model performs better on the validation set. To a lesser extent, it is also because training metrics report the average for an epoch, while validation metrics are evaluated after the epoch, so validation metrics see a model that has trained slightly longer.
Another new feature in TF2.0 is the ability to use full-fledged TensorBoard inside Jupyter notebooks. We start TensorBoard, before starting the model training, so that we can view the metrics as the model trains. Use the following commands (make sure you create logs/
directory beforehand):
%load_ext tensorboard.notebook
%tensorboard --logdir logs/
3. Using a pre-trained network
In the previous section, we trained a simple CNN which gave us an accuracy of ~70%. We can easily do much better than this by using larger and more complex architectures. There are many open-source pre-trained networks available for the similar image classification task as of ours. A pre-trained model is a saved network that was previously trained on a large dataset, typically on a large-scale image-classification task. We can either use the pre-trained model as it is or perform transfer learning using the pre-trained convents. The intuition behind transfer learning is that if this model was trained on a large and general enough dataset, then this model will effectively serve as a generic model of the visual world. We can leverage these learned feature maps without having to train a new large model on a large dataset.
Downloading a pre-trained model
We will create a base model from the InceptionV3 model developed at Google, and pre-trained on the ImageNet dataset, a large dataset of 1.4M images and 1000 classes of web images. This model has already learned the basic features that are common in 1000 objects that we see daily. Hence, it has a strong feature extraction capability. We download a network that doesn’t include the classification layers at the top, by specifying include_top=False
argument, because we only want to use the feature extraction portion of these pre-trained convnets (convolutional base) since they are likely to be generic features and learned concepts over a picture. The classification part of the pre-trained model is often specific to the original classification task, and subsequently specific to the set of classes on which the model was trained.
This base model acts as a feature extractor, it converts each (128, 128, 3) input image to a (2, 2, 2048) block of features. We can consider features as some multi-dimensional representation of the input, understandable by the model, and which helps it classify the input image into one of the many classes on which the model was trained on. To understand more about these features and their visualizations check out this post:
Adding a classification head
While downloading the pre-trained model we had removed the classification part of it, by specifying include_top=False
parameter, since it is specific to the set of classes on which the model was trained. We now add a new classification head which would be specific to our tf_flowers
dataset. We stack these new layers on top of our base model using Keras’s Sequential API.
The code is very easy to understand:
- We average the features given by the base model (2x2x2048) over 2x2 spatial locations using
keras.layers.GlobalAveragePooling2D()
layer and convert it to a single 2048-element vector per image. - On top of it, we apply a
keras.layers.Dense()
layer to convert these features into a single prediction per image from a total of 5 classes intf_flowers
dataset.
It’s important to freeze the convolutional base before we compile and train the model, we do this by setting base_model.trainable = False
. By freezing we prevent the weights in the base model from being updated during training. We now compile our model to configure it with training parameters. Once the model is compiled, it can now be trained on our flowers dataset.
Training the classification head
We train the model using the same steps that we used to train our simple CNN. We plot the training and validation metrics.
As we can see that our validation is accuracy is slightly higher than our training accuracy. This is a good sign as we can conclude that our model performs well on unseen data (validation set). We can confirm this by using our test set to evaluate the model. However, we can still improve the performance of this model by performing fine-tuning.
Fine-tuning a pre-trained network
In the previous step, we were only training a few layers on top of an Inception V3 base model. The weights of the pre-trained base network were not updated during training. One way to increase performance even further is to “fine-tune” the weights of the top layers of the pre-trained model alongside the training of the top-level classifier. This training process will force the base model weights to be tuned from generic features maps to features associated specifically to our dataset. Read more here on the official TensorFlow website.
Below code snippet unfreezes the layers of the base model in order to make it trainable. Since we have made changes to the model we need to recompile the model before calling the .fit function.
The goal of fine-tuning is to adapt these specialized features to work with the new dataset. If you trained to convergence earlier, this will get you a few percent more accuracy. But, if the training dataset is fairly small, and is similar to the original datasets that Inception V3 was trained on, then fine-tuning may result in overfitting. We plot the training and validation metrics once again after fine-tuning.
Note: This should only be attempted after you have trained the top-level classifier with the pre-trained model set to non-trainable. If you add a randomly initialized classifier on top of a pre-trained model and attempt to train all layers jointly, the magnitude of the gradient updates will be too large (due to the random weights from the classifier) and your pre-trained model will just forget everything it has learned.
As we can see our accuracies have improved for both the training and validation sets. Though the loss after the first epoch of finetuning did shoot up, it eventually came down at the end. One reason for this could be that the weights might have been updated slightly more aggressively than needed. That’s why it is important to keep a lower learning rate for fine-tuning as compared to the classification head training.
4. Serving a model using TensorFlow Serving
Using TensorFlow Serving server we can deploy our trained flower image classification model by providing a URL endpoint using which anyone can make a POST request and they would get a JSON response of what the model has inferred without having to worry about its technicalities. Check out a detailed tutorial about TensorFlow Serving in this post:
Installing TensorFlow Serving
1. Add TensorFlow Serving distribution URI as a package source (one-time setup)
$ echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list && \$ curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -
2. Install and update TensorFlow ModelServer
$ apt-get update && apt-get install tensorflow-model-server
Once installed, the binary can be invoked using the command $tensorflow_model_server
.
Exporting Keras models to SavedModel format
To load our trained model into TensorFlow Serving server we first need to export it in SavedModel format. TensorFlow provides the SavedModel
format as a universal format for exporting models. Under the hood, our Keras model is fully specified in terms of TensorFlow objects, so we can export it just fine.
This will create a protobuf file in a well-defined directory hierarchy and will include a version number. TensorFlow Serving allows us to select which version of a model, or “servable” we want to use when we make inference requests. Each version will be exported to a different sub-directory under the given path.
Starting TensorFlow Serving server
To start TensorFlow Serving server on your local machine, run the following command:
$ tensorflow_model_server --model_base_path=/home/ubuntu/Desktop/Medium/TF2.0/SavedModel/inceptionv3_128_tf_flowers/ --rest_api_port=9000 --model_name=FlowerClassifier
--model_base_path
: This has to be an absolute path else you will get an error saying:
Failed to start server. Error: Invalid argument: Expected model ImageClassifier to have an absolute path or URI; got base_path()=./inceptionv3_128_tf_flowers
--rest_api_port
: Tensorflow Serving will start a gRPC ModelServer on port 8500 and the REST API will be available on port 9000.--model_name
: This will be the name of your Serving server using which you will send a POST request. You can type any name you want here.
Making REST requests to TensorFlow Serving Server
TensorFlow ModelServer supports RESTful APIs. We’ll send a predict request as a POST to our server’s REST endpoint. But before making a POST request we need to load and pre-process our sample image. TensorFlow Serving server expects the input image dimensions to be (1, 128, 128, 3), where ‘1’ is the batch size. We use the image pre-processing utils from the Keras library to load and convert the input image to the required dimensions.
The URL for the server’s REST endpoint follows the following format:
http://host:port/v1/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]:predict
where /versions/${MODEL_VERSION}
is optional. The following code loads and pre-processes the input image and makes a POST request using the above a REST endpoint.
The above code produces the following output:
Top 3 predictions:
[('sunflowers', 0.978735), ('tulips', 0.0145516), ('roses', 0.00366251)]
Summary
In summary here’s what we have covered in the above tutorial on building and deploying an image classifier in TF2.0:
- We used TensorFlow Datasets to download publicly available datasets in just a few lines of code. It also enabled us to perform efficient training of our convolutional neural network model.
- Using
tf.keras
we not only built a CNN from scratch but we were also able to reuse a pre-trained network to get much higher accuracy on our flowers dataset in just a few epochs. - Finally, we deployed our trained model using TensorFlow Serving server. This makes it easy to integrate our model into websites and other applications just by calling the URL endpoint.