Tensorflow 1.x to Tensorflow 2.0 — Coding changes

Imran us Salam
Red Buffer
Published in
10 min readMay 3, 2019

In this article, we are going to cover the coding differences when writing code in TensorFlow 1.x VS TensorFlow 2.
In the below code snippets, you will see that TensorFlow 2 uses a lot of new stuff and a lot of old stuff from TensorFlow 1.x. It’s in the alpha phase right now. The key difference we will see is how TensorFlow 2.0 uses the power of Keras to reduce the lines of code and how easy it is to switch from TensorFlow 1.x to TensorFlow 2.0-alpha.

We are going to be using the terms TF1.x or TF1 for TensorFlow 1.x and TF2.0 or TF2 for TensorFlow 2.0 in the rest of the article.

We are going to be focusing more on the explaining TF2.0 code than TF1.x code. There are a lot of examples for TF1.x available online with enough explanation.

The source code for the article will be provided at the end.
I am going to convert the code snippets to images, so you cannot copy paste and have to write the whole thing by yourself.
Don’t hate me just yet. It’s for your own practice ;)

Let’s get started

The main thing you need to know about TF1.x is that it works on Graphs and Sessions.

Graphs

In simple terms, graph is where you create your nodes and relate them to each other. For example, in the case of simple Linear Regression, you will create 7 main nodes.
1. The input independent feature node as a placeholder (we will see what it is in the later part)
2. Another input node where we define the dependent feature or a target variable (also a placeholder)
3. A variable node which would serve as the parameter (W). This should be changing as the training goes on with some optimization algorithm and a defined loss.
4. A variable node called Bias. Which is also a parameter and is also differentiated in the optimization step.
5. A resultant of the above steps called the prediction. Given by;
prediction = X * W + b.
6. A loss function is computed over the predicted value and the actual dependent target value.
7. An optimizer that optimizes the variables to minimize the loss function.

X here is the node defined in the 1st step, W is the node in the 3rd step and b is the node from the 4th step.

The Blue ones are Variables and the Green ones are placeholders. The red one is the optimizer that is used to optimize the Variables given a loss function.

Session

Now that we’ve made our graph, we are done with most of the work. When we run the graph, we tell it what to execute, in this case, the ‘optimizer’. Since the optimizer depends on the loss function and y, and these depend on the placeholders which in turn depend on the input variables, we’ll have to provide the values for input variables to execute the optimizer. Hence, the input variable is where we plug in our training data.

Basic Types

There are three main basic data types in TF 1.x. They include
1. constant
2. placeholder
3. Variable (Variable is a class though)

Constant is a type that as the name suggests cannot change their value over a period of time or execution of the graph. An example is given below.

In the image above we have defined a graph by calling tf.Graph() and we are calling it “basic_addition_graph”. To run this graph, we have a session and we are opening it by passing the graph parameter that we just defined.

Here we can see that we’ve defined the constants by a value, a name, a datatype, and a shape. We don’t need to add the name, datatype, and shape. The datatype and shape will be inferred from the value(s) provided. Name is not necessary, but it’s a good practice to give everything in your graph a proper name.

Here we are summing up two constants “a” and “b” and multiplying their resultant with “c”. Note that we can actually use broadcasting when multiplying or adding.

In the session, we have provided “x” and “y”. This means that we want them as output from the graph.

Placeholder is a type where you can place your input values while training or otherwise. Let’s look at the image below to see how it works.

As you can see above, we have provided the values of “a” and “b” at run time from the session by a “feed_dict” that is used to send inputs to the graph, to be stored in the placeholder. And at the same time, we want the graph to return “x” and “y”.

Variable is the most important type of Tensorflow. It’s not exactly a type. It’s a class that has a lot of functionality within itself.
Remember parameters that are tuned with the training set. We can define those parameters that can be optimized by the Variable class. Let’s see the below image to see how we can do that.

Notice how we’ve initialized the variable “W” with random (normal). This is the starting point from where it will be changed when provided with some loss function and an optimizer. (Do read the comments)

Tensorflow 2.0

Now let’s see what’s changed in TF2.0. The biggest change is that now we can add a mode called “Eager Execution” which let’s you skip all the graph — session.
That means we can now write TensorFlow code like writing simple python code. Like writing a numpy code.
Let’s see how it looks like

That’s amazing, so how do we add trainable variables and other stuff. Well there you go:

It’s as simple as that.

Training a Linear Regression Model

Let’s walk through how we can write a Linear Regression model with Mean Squared error in TF 1.x and TF 2.0. Let’s see the difference in coding style

Code for Linear Regression with Tensorflow 1.x
Code for Linear Regression with Tensorflow 2.0

While training over a simple dataset, see how many lines we have saved while writing with TF2.0.

We’ve already gone through the details of TF1.x code. The only thing added here is the optimizer step.

In the TF2.0 code, we made a class and took the tf.keras.Model class as the parent class. That way we could add all the functionality of a Model and define our class of Model and our way of calling it. If you look closely, this kind of mimics the coding style of “pytorch”. Where you define the layers you want to use and then stack them in the forward function. Here we have __init__ method where we declare all the layers we are going to use.

The dense layer is a fully connected layer as we’ve seen in the example of TF1.x. The method applies X * W + b. In this case, since the feature size of X is 13 and we’ve defined the dense layer with the parameter of 1. So the shape of W will be (13,1) and the bias b will be (1,).

Since tf.keras.layers has almost all the usable layers, we are going to be using them instead of defining our own layers with variables.

The call method is where you define the flow of your model. How the model will forward propagate and then subsequently back propagate.

In the compile method, we define our model’s hyper parameters. The kind of loss to use and the optimizer.

The fit method starts training. It takes either a numpy tensor or tf.tensor type as input for both dependent and independent variables and the number of epochs and other parameters like callbacks and validation data.

The evaluate function evaluates over the provided dataset. The dataset again can be of numpy type or tf.tensor type.

Training a Logistic Regression

Training a Logistic Regression classifier in TF2.0 is as simple as writing a simple Linear Regression code. Let’s look at both TF1.x code and TF2.0 code to see how we can do that.

Code for Logistic Regression using Tensorflow 1.x
Code for Logistic Regression using Tensorflow 2.0

As we can see there are a couple of differences here with the Linear Regression example. The first one is, we have a flatten layer. Since we are dealing with 2D images (MNIST Dataset is a 28,28 images dataset). We will flatten out these images. These will make a shape of 784.

The second difference is that we have a dense layer with a shape of (784, 10) because we have 10 labels. With an added activation softmax. This activation converts the logits into 0–1 range.

The rest is the same except the loss function, Since it’s a classification problem we are going to use a categorical cross entropy loss function here. The rest is the same.

Training a Neural Network

Training a Neural Network again is as simple as writing Logistic Regression code in TF 2.0. Let’s see:

Code for Neural Network with Tensorflow 1.x
Code for Neural Network with Tensorflow 2.0

We can see that we’ve defined above the code for a three hidden and one output layered network.

The only difference here is how we have different dense layers working together with a relu activation (except for the last layer). See how I intentionally messed up the order of these layers in the __init__ method.

The reason to do that is because I want to show that in __init__ we are only defining the layers we are going to use.

In the call method, I’ve stacked the layers the way they’re supposed to be stacked.

This is the flow of data between layers:
1. dense1 → input: (N, 784), output: (N, 256)
2. dense2 → input: (N, 256), output: (N, 256)
3. dense3 → input: (N, 256), output: (N, 128)
4. dense4 → input: (N, 128), output: (N, 10)

Everything else is the same as Logistic regression.

Training a Convolutional Neural Network

Training a ConvNet is as easy as training the above Neural Network. All we are going to do is change some layers. Let’s look at the same example.

Convolution Neural Network code written in Tensorflow 1.x
Convolution Neural Network code written in Tensorflow 2.0

Let’s see the differences here:

We don’t have a flatten layer because we are using a 2D Convolution layer. The first parameter is the number of output channels. The next parameter is the kernel size that will convolve spatially. We haven’t mentioned stride here. But by default it’s (1,1). Padding can be either same or valid. The difference is valid drops what’s left at the end. And same will add a padding of zeros but will not lose any values.

The max-pooling layer here will pool the input it receives by a kernel of some shape. It will pool spatially. The same parameters apply here as well.

We can see that we have a flatten layer. This will convert the shape of 4
(N, W, H, C) into (batch, M).

At the end we have a few dense layers that will finally end up producing an output of the shape (N, 10). The rest is the same.

Saving and Restoring a model weights

Saving model weights in TensorFlow has always been an easy task. Let’s find out how

Saving a model in Tensorflow 1.x
Saving a model in Tensorflow 2.0

Saving a model’s weights isn’t a problem.

Restoring it… well, it’s not as straightforward as saving it. But is fairly easy. Let’s see how we can do that.

Restoring a model in Tensorflow 1.x
Restoring a model in Tensorflow 2.0

In both cases, we have to make the structure of both networks. In the case of TF1.x we create the same graph and then load the weights in session. In TF2.0 we use the “load_weights” method to load weights. Please read the comments in TF1.x code for better understanding about what files are saved in TF1.x.

Now as promised, I will provide the link to the code. Note that the code is written from a beginners perspective and is usually repetitive in places. Also, this guide is incomplete, this is a very introductory guide on how to get started with the TF2.0. There are lots of new and exciting stuff given in the docs. Do check them out. Especially the gradient tape. That is a very powerful feature they’ve added.
https://github.com/imransalam/basic-tf-intro-notebooks

To see the power of gradient tape, check out this style transfer repository I made using it.
https://github.com/imransalam/style-transfer-tensorflow-2.0

Thank you for reading, I hope you enjoyed it. For questions, leave a comment and if there is something wrong here, let me know I’ll fix it.

Happy Learning!

--

--