Basics of TensorFlow 2.0 and Training a Model

Published in

Analytics Vidhya

6 min readApr 13, 2020

Introducing TensorFlow 2.0

TensorFlow is a numerical processing library was originally developed at Google used by researchers and machine learning practitioners to conduct machine learning research. You can perform any numerical operation with TensorFlow, it is mostly used to train and run deep neural networks.

TensorFlow primarily offers to simplify the deployment of machine learning and deep learning solutions on various platforms — computer CPUs,GPUs, mobile devices, and, more recently, in the browser. On top of that, TensorFlow offers many useful functions for creating machine learning models and running them at scale. In 2019, TensorFlow 2 was released with a focus on ease of use while maintaining good performance.

TensorFlow 2.0 allowing newcomers to start with a simple API and experts to create very complex models at the same time. Let’s explore those different levels.

TensorFlow 2.0 main architecture

TensorFlow 2 architecture has several levels of abstraction. Let’s first introduce the lowest layer and find our way to the uppermost layer:

Explaining the layers of architecture:

C++ Layer

Most deep learning computations are coded in C++. To run operations on the GPU, TensorFlow uses a library developed by NVIDIA called CUDA.This is the reason you need to install CUDA if you want to exploit GPU capabilities and why you cannot use GPUs from another hardware manufacturer.

Low-level API

The Python low-level API then wraps the C++ sources. When you call a Python method in TensorFlow, it usually invokes C++ code behind the scenes. This wrapper layer allows users to work more fast because Python is considered easier to use than C++ and does not require compilation. This Python wrapper makes it possible to perform extremely basic operations such as matrix multiplication and addition.

High-level API

At the top sits the high-level API, made of two components — Keras and the Estimator API. Keras is a user-friendly, modular, and extensible wrapper for TensorFlow. The Estimator API contains several pre-made components that allow you to build your machine learning model easily. You can consider them building blocks or templates. pre-made components let you experiment with different model architectures by making only minimal code changes.

Introducing Keras

Keras was first released in 2015 and designed as an interface to enable fast experimentation with neural networks. There are several deep learning frameworks out there that helps for building deep neural networks. TensorFlow, Theano, CNTK (Microsoft) are some of the major frameworks used in the industry and in the research. Keras act as a wrapper for these frameworks. Known for its user-friendliness, it is the library of choice and ultimate deep learning tool for developers.

Architecture of Keras API

Why Keras ?

It supports CNN, RNN & combination of both
Fast prototyping
Deep enough to build serious models
Well-written document — refer http://keras.io

Basically, Keras models go through the following pipeline.

A simple computer vision model using Keras

Let’s start with a classical example of computer vision — digit recognition with the Modified National Institute of Standards and Technology (MNIST) datasets.

For installation of TensorFlow 2.x version.

#!pip install tensorflow==2.0.0alpha0 #Tensorflow alpha version
#!pip install tensorflow==2.0.0-beta1 #Tensorflow beta version#print(tf.__version__) # Check the version

Preparing the data

First, we import the data. It is made up of 60,000 images for the training set and 10,000 images for the test set:

import tensorflow as tf # import tensorflow as tf for faster typing 
import numpy as np      # import numerical python as np num_classes = 10
img_rows, img_cols = 28, 28
num_channels = 1
input_shape = (img_rows, img_cols, num_channels)(x_train, y_train),(x_test, y_test) =  tf.keras.datasets.mnist.load_data()     #load the datasetsx_train, x_test = x_train / 255.0, x_test / 255.0 #DataNormalization

The tf.keras.datasets module provides quick access to download and instantiate a number of classical datasets. After importing the data using load_data, we divide the array by 255.0 to get a number in the range [0, 1] instead of [0, 255]. It is common practice to normalize data, either in the [0, 1] range or in the [-1, 1] range.

Building the model

Moving to building the actual model. We will use a very simple architecture composed of two fully connected layers called as Dense layers . Now, let’s have a look at the code. As you can see, Keras code is very briefly and clearly written.

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128, activation='relu'))
model.add(tf.keras.layers.Dense(num_classes, activation='softmax'))

Since our model is a linear stack of layers, we start by calling the Sequential function. We then add each layer one after the other. Our model is composed of two fully connected layers. We build it layer by layer:

Flatten: This will take the 2D matrix representing the image pixels and turn it into a 1D array. We need to do this before adding a fully connected layer. The 28 × 28 images are turned into a vector of size 784.
Dense of size 128: This will turn the 784 pixel values into 128 activations using a weight matrix of size 128 × 784 and a bias matrix of size 128. In total, this means 100,480 parameters.
Dense of size 10: This will turn the 128 activations into our final prediction. Notice that because we want probabilities to sum to 1, we will use the softmax activation function. The softmax function takes the output of a layer and returns probabilities that sum up to 1. It is the activation of choice for the last layer of a classification model.

You can get a description of the model, the outputs, and their weights.

model.summary()

Here is the output:

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param # 
=================================================================
flatten_1 (Flatten) (None, 784) 0 
_________________________________________________________________
dense_1 (Dense) (None, 128) 100480 
_________________________________________________________________
dense_2 (Dense) (None, 10) 1290 
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0

Training the model

Keras makes training extremely simple:

model.compile(optimizer='sgd',loss='sparse_categorical_crossentropy, metrics=['accuracy'])

Calling .compile() on the model we just created is a mandatory step. A few arguments must be specified:

optimizer: This is the component that will perform the gradient descent.
loss: This is the metric we will optimize. In our case, we choose cross-entropy, just like in the previous chapter.
metrics: These are additional metric functions evaluated during training to provide further visibility of the model’s performance (unlike loss, they are not used in the optimization process).

model.fit(x_train, y_train, epochs=5, verbose=1, validation_data=(x_test, y_test))

Then, we call the .fit() method. We will train for five epochs, meaning that we will iterate over the whole train dataset five times. Notice that we set verbose to 1. This will allow us to get a progress bar with the metrics we chose earlier, the loss, and the Estimated Time of Arrival (ETA). The ETA is an estimate of the remaining time before the end of the epoch. Here is what the progress bar looks like:

Evaluate the model:

model.evaluate(x_test,y_test)

We followed three main steps:

Loading the data: In this case, the dataset was already available. During future projects, you may need additional steps to gather and clean the data.
Creating the model: This step was made easy by using Keras — we defined the architecture of the model by adding sequential layers. Then, we selected a loss, an optimizer, and a metric to monitor.
Training the model: Our model worked pretty well the first time. On more complex datasets, you will usually need to fine-tune parameters during training.

The whole process was extremely simple thanks to Keras, the high-level API of TensorFlow. Behind this simple API, the library hides a lot of the complexity.

REFERENCES: