Understanding TensorFlow: Part 3–1

Serie 3–1: Neural network-related operations

dan lee
6 min readSep 9, 2021

Today’s outline:

  1. Introduction
  2. Nonlinear activations used by neural networks
  3. The convolution operation
  4. The pooling operation

1. Introduction

Now let’s look at several useful neural network-related operations that we will use heavily in the following chapters. The operations we will discuss here range from simple element-wise transformations (that is, activations) to computing partial derivatives of a set of parameters with respect to another value. We will also implement a simple neural network as an exercise.

TensorFlow 2.x has incorporated Keras into its own system, known as tf.keras Some operations would have two ways to be done. In practice, tf.keras is quite convenient to use. So we will introduce both tf.keras and tf for each operation.

2. Nonlinear activations used by neural networks

Nonlinear activations enable neural networks to perform well at numerous tasks. Typically, there is a nonlinear activation transformation (that is, activation layer) after each layer output in a neural network (except for the last layer). A nonlinear transformation helps a neural network to learn various nonlinear patterns that are present in data. This is very useful for complex real-world problems, where data often has more complex nonlinear patterns, in contrast to linear patterns. If not for the nonlinear activations between layers, a deep neural network will be a bunch of linear layers stacked on top of each other. Also, a set of linear layers can essentially be compressed to a single bigger linear layer. In conclusion, if not for the nonlinear activations, we cannot create a neural network with more than one layer.

Now we’ll list two commonly used nonlinear activations in neural networks and how they can be implemented in TensorFlow. :

# Sigmoid activation of x is given by 1 / (1 + exp(-x))

tf.nn.sigmoid(x,name=None)

or

tf.keras.activations.sigmoid(x)

# ReLU activation of x is given by max(0,x)

tf.nn.relu(x, name=None)

or

tf.keras.activations.relu(x, alpha=0.0, max_value=None, threshold=0)

3. The convolution operation

A convolution operation is a widely used signal-processing technique. For images, convolution is used to produce different effects of an image. An example of edge detection using convolution is shown in Figure 2.6. This is achieved by shifting a convolution filter on top of an image to produce a different output at each location (see Figure 2.7 later in this section). Specifically, at each location we do element-wise multiplication of the elements in the convolution filter with the image patch (same size as the convolution filter) that overlaps with the convolution filter and takes the sum of the multiplication:

Figure 2.6: Using the convolution operation for edge detection in an image(Source: https://en.wikipedia.org/wiki/Kernel_(image_processing))

The following is the implementation of the convolution operation:

x = tf.constant( [[

[[1],[2],[3],[4]],

[[4],[3],[2],[1]],

[[5],[6],[7],[8]],

[[8],[7],[6],[5]]

]],

dtype=tf.float32)

x_filter = tf.constant( [

[

[[0.5]],[[1]]

], [

[[0.5]],[[1]]

]

],

dtype=tf.float32)

x_stride = [1,1,1,1]

x_padding = ‘VALID’

x_conv = tf.nn.conv2d( input=x, filters=x_filter, strides=x_stride, padding=x_padding)

# Returns (out) =>

tf.Tensor(

[[[[ 7.5]

[ 7.5]

[ 7.5]]

[[13.5]

[13.5]

[13.5]]

[[19.5]

[19.5]

[19.5]]]], shape=(1, 3, 3, 1), dtype=float32)

Here, the apparently excessive number of square brackets used might make you think that the example can be made easy to follow by getting rid of these redundant brackets. Unfortunately, that is not the case. For the tf.nn.conv2d(…) or tf.keras.layers.conv2d(…) operation, TensorFlow requires input, filter, and stride to be of an exact format. We will now go through each argument in conv2d(input, filter, strides, padding) in more detail:

input: This is typically a 4D tensor where the dimensions should be ordered as [batch_size, height, width, channels].

  • batch_size: This is the amount of data (for example, inputs such as, images, and words) in a single batch of data. We normally process data in batches as often large datasets are used for learning. At a given training step, we randomly sample a small batch of data that approximately represents the full dataset. And doing this for many steps allows us to approximate the full dataset quite well. This batch_size parameter is the same as the one we discussed in the TensorFlow input pipeline example.
  • height and width: This is the height and the width of the input.
  • channels: This is the depth of an input (for example, for a RGB image, channels will be 3 — a channel for each color).

filter: This is a 4D tensor that represents the convolution window of the convolution operation. The filter dimensions should be [height, width, in_channels, out_channels]:

  • height and width: This is the height and the width of the filter (often smaller than that of the input)
  • in_channels: This is the number of channels of the input to the layer
  • out_channels: This is the number of channels to be produced in the output of the layer

strides: This is a list with four elements, where the elements are [batch_stride, height_stride, width_stride, channels_stride]. The strides argument denotes how many elements to skip during a single shift of the convolution window on the input. If you do not completely understand what strides are, you can use the default value of 1.

padding: This can be one of [‘SAME’, ‘VALID’]. It decides how to handle the convolution operation near the boundaries of the input. The VALID operation performs the convolution without padding. If we were to convolve an input of n length with a convolution window of size h, this will result in an output of size (n-h+1 < n). The diminishing of the output size can severely limit the depth of neural networks. SAME pads zeros to the boundary such that the output will have the same height and width as the input.

To gain a better understanding of what filter size, stride, and padding are, refer to Figure 2.7:

Figure 2.7: The convolution operation

As we mentioned before, TensorFlow 2.x also provide us a tf.keras realization of convolution operation, tf.keras.layers.Conv2D. Different from tf.nn.conv2d, tf.keras.layers.Conv2D accepts different parameters. filters no longer mean the tensor that represents the convolution window of the convolution but a number that represents how many filters do you want to use in this layer. that is, the dimensionality of the output space.kernel_size is an integer or tuple/list of 2 integers, specifying the height and width of the 2D convolution window. Here you may ask where x_filter defined in the above code goes. Well, in tf.keras, we don’t have to input an x_filter by yourself. Since in practice, x_filter is what we want to train to get, tf.keras helps us initialize it implicitly and automatically.

The following is the implementation of the convolution operation by tf.keras. You can find that the result is different from above because the x_filter is initialized by tf.keras, not by us. But the shape is still the same because we specified filters is 1 and x_filter shape is 2 x 2 just like the above code:

x = tf.constant( [[

[[1],[2],[3],[4]],

[[4],[3],[2],[1]],

[[5],[6],[7],[8]],

[[8],[7],[6],[5]]

]],

dtype=tf.float32)

x_conv_keras = tf.keras.layers.Conv2D(filters=1, kernel_size=2, padding=x_padding)(x)

print(x_conv_keras)

# Returns (out) =>tf.Tensor(

[[[[ 3.890316 ]

[ 2.5445178]

[ 1.1987193]]

[[ 7.4837427]

[ 8.829541 ]

[10.17534 ]]

[[ 8.851723 ]

[ 7.5059247]

[ 6.160126 ]]]], shape=(1, 3, 3, 1), dtype=float32)

4. The pooling operation

A pooling operation behaves similar to the convolution operation, but the final output is different. Instead of outputting the sum of the element-wise multiplication of the filter and the image patch, we now take the maximum element of the image patch for that location (see Figure 2.8):

x = tf.constant( [[

[[1],[2],[3],[4]],

[[4],[3],[2],[1]],

[[5],[6],[7],[8]],

[[8],[7],[6],[5]]

]],

dtype=tf.float32)

x_ksize = [1,2,2,1]

x_stride = [1,2,2,1]

x_padding = ‘VALID’

x_pool = tf.nn.max_pool(input=x, ksize=x_ksize, strides=x_stride, padding=x_padding)

or

x_pool_keras = tf.keras.layers.MaxPool2D()(x)

# Returns (out) =>

[[[[ 4.]

[ 4.]],

[[ 8.]

[ 8.]]]]

Figure 2.8: The max-pooling operation

Other Series of Understanding TensorFlow:

Serie1: https://medium.com/@Adline125/understanding-tensorflow-series-979e71cc5562

Serie2: https://medium.com/@Adline125/understanding-tensorflow-fcc431891d08

Serie3–1: https://medium.com/@Adline125/understanding-tensorflow-ce18f0e1bbbc

Serie3–2: https://medium.com/@Adline125/understanding-tensorflow-2c6496b71368

Serie4: https://medium.com/@Adline125/understanding-tensorflow-94bdea8e1fd9

--

--

dan lee

NLP Engineer, Google Developer Expert, AI Specialist in Yodo1