Understanding TensorFlow: Part 4

Serie 4: Implementing our first neural network

7 min readSep 9, 2021

Today’s outline:

Implementing our first neural network
Preparing the data
Buiding neural network model
Defining loss and optimizer
Training the neural network

Implementing our first neural network

Great! Now that you’ve learned the most basic operations of TensorFlow 2.x, it’s high time that we move on and implement something moderately complex. Let’s implement a neural network. Precisely, we will implement a fully connected neural network model.

One of the stepping stones to the introduction of neural networks is to implement a neural network that is able to classify digits. For this task, we will be using the famous MNIST dataset made available at http://yann.lecun.com/exdb/mnist/.

You might feel a bit skeptical regarding our using a computer vision task rather than an NLP task. However, vision tasks can be implemented with less preprocessing and are easy to understand.

As this is our first encounter with neural networks, we will walk through the main parts of the example. However, note that I will only walk through the crucial bits of the exercise.

Preparing the data

In TensorFlow 2.x, the model inputs are Tensors with concrete value. As long as we download the dataset mnist and preprocess it to a Tensor with normalization and batch, the model inputs would be ready.

The mnist data set is available at this link http://yann.lecun.com/exdb/mnist/. Download it and we will read it directly. Don’t forget to change the data path to where you put the data.

Now, we can process the data one more step with tf.data.Dataset. The iteration mechanism of it happens in a streaming fashion, so the full dataset does not need to fit into memory. And it is convenient to bach the dataset with tf.data.Dataset.from_tensor_slices().batch(). from_tensor_slices() will slice you dataset (a Python list) into item by item. batch() will divide them into batch number groups.

train_ds = tf.data.Dataset.from_tensor_slices(

(x_train, y_train)).shuffle(10000).batch(32)

test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(32)

Buiding neural network model

There are various ways to build a neural network model. It can be quite complicated using lower-level API to build every block you need by yourself or can be quite simple by using higher-level API to quickly check and run a model to evaluate your task solution. In practice, the task-oriented situation is much more common. tf.keras API would be a better choice to find the balance between general use and customization.

If you are a beginner of TensorFlow and want to witness the magic of a neural network as soon as possible. tf.keras.Sequential model would be a nice choice. It is the easiest and clearest way to build your model. Two steps are needed:

Think of the layers and their order that you want to use in your model.
List the above layers in a list and send them to tf.keras.Sequential as its parameter

In our example, We want to use two full connection layers, tf.keras.layers.Dense, to finish our task. But tf.keras.layers.Dense only accept a Tensor with the shape of 1 axis. The minst data are images with shape [28, 28]. There are 2 axes. So we need a layer tf.keras.layers.Flatten to change data shape to [28 * 28] with 1 axis. Now we solved our shape incompatible problem. Next, full connection layer has a problem. There are too many parameters in it which leads to overfitting easily. tf.keras.layers.Dropout can get rid of a certain percentage of parameters randomly to prevent overfitting. e.g. 20% parameters would be dumped if you pass 0.2 to it.

Now we know what layers we may use in our model. Let’s put them in order to a list, then send the list to our tf.keras.Sequential model.

model = tf.keras.models.Sequential([

tf.keras.layers.Flatten(input_shape=(28, 28)),

tf.keras.layers.Dense(128, activation=’relu’),

tf.keras.layers.Dropout(0.2),

tf.keras.layers.Dense(10)

])

Another way to build a neural network model with tf.kears API is model subclassing API. Define the model as a class that takes heritage from tf.keras.Model. There are two necessary functions to be defined in your class. the __init__() and the call(). In the __init__() function, the two steps to building your model we mentioned above can apply here with a little change.

Think of the layers and their order that you want to use in your model.
List the above layers one by one in order as the class member of your model.

This time we want to add a new layer called CNN to our model. CNN usually results in better performance. tf.keras.layers.Conv2D is a kind of CNN layer that accepts Tensor shape with [height, width, channels] or [channels, height, width]. e.g. input_shape=(128, 128, 3) for 128x128 RGB pictures in data_format=”channels_last”.

In our example, the CNN layer can accept our image data shape naturally. CNN layer would be the first layer to accept the inputs directly in our model and output a certain number (specified by layer parameter ‘filters’) of Tensors with the shape of [height, width] or a similar shape influenced by parameter `padding`. We can ignore the difference when `padding` is set to be different value by now. What we need to know is that both situations will produce a Tensor with the shape that needs to be flattened if the CNN layer wants to be followed by a full connection layer.

The call() function defines the operations between these layers we defined above. We send input data to the CNN layers and send the result to the next layer and so on.

Defining loss and optimizer

No matter which method you use to build your model, loss function and optimizer must be specified before you start to train your model.

In our example, our task is a 10-class classification which is a sparse category task. We will use SparseCategoricalCrossentropy to be our model loss. SparseCategoricalCrossentropy computes the sparse categorical cross-entropy loss.

The optimizer will be set to be Adam. The optimization problem has a developing history from SGD, Momentum, Nesterov Accelerated Gradient, AdaGrad, RMSProp to Adam. They influence and fix problems of one to another. Since the optimization problem is not the focus of this chapter. let’s just remember the conclusion that you should use Adam in most cases.

loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

optimizer = tf.keras.optimizers.Adam()

Training the neural network

Here we will list the training code only. You can find the complete code of both training and testing in the tensorflow_introduction.ipynb file in the ch2 folder.

With tf.keras.sequence model, we train our model only with two lines.

model.compile(optimizer=optimizer,

loss=loss,

metrics=[‘accuracy’])

model.fit(x_train, y_train, epochs=5)

If you want to define the training process by yourself, we need GradientTape which we introduced in the last section. GradientTape will help us to calculate the gradient with respect to loss and model parameters for each step. Then the optimizer will apply the gradient to update model parameters.

For each step, there should be a metric to tell us the performance the model is getting better or not. The metric which is commonly chosen is Accuracy.

First, define a function of what to do in each step. we will list the operations we mentioned above in this function including calculating loss, calculating gradients, apply gradients and calculating accuracy.

Second, start to train. Specify how many epochs you want to train. Then input your data to the train step function until the epochs is finish iterating.

In this code, accuracy is a function that takes some predictions and labels as inputs and provides the accuracy (how many predictions matched the actual label). It is defined in the exercise file.

If successful, you should be able to see behavior similar to the ones shown in Figure 2.10. After 50 epochs, the test accuracy should reach approximately 98%:

Figure 2.10: Training loss and test accuracy for the MNIST digit classification task

Summary

In this series, you took your first steps to solve NLP tasks by understanding the difference between TensorFlow 1.x and TensorFlow 2.x on which we will be implementing our algorithms. First, we discussed the eager execution which enables us to program in a much more intuitive way. Next, we discussed autograph (tf.function) which supports automatically construct graphs when you need them.

Then we discussed basic operations of TensorFlow 2.x including Tensor creation, merge and split, Tensor comparison, mathematical operations, and neural network-related operations. Later, we brought all these elements together to implement a neural network to classify an MNIST dataset.

Other series of Understanding TensorFlow:

Serie1: https://medium.com/@Adline125/understanding-tensorflow-series-979e71cc5562

Serie2: https://medium.com/@Adline125/understanding-tensorflow-fcc431891d08

Serie3–1: https://medium.com/@Adline125/understanding-tensorflow-ce18f0e1bbbc

Serie3–2: https://medium.com/@Adline125/understanding-tensorflow-2c6496b71368

Serie4: https://medium.com/@Adline125/understanding-tensorflow-94bdea8e1fd9