Getting Started with TensorFlow: Basics of the TensorFlow API

Tim Wroge
Tim Wroge
Jul 22, 2017 · 10 min read

In my last post, I talked about the mathematical underpinnings of Artificial Neural Network models and how they arose. In this post, I am going to be giving a detailed explanation of the TensorFlow API, using the MNIST data set to train a simple Neural Network.

The basis for TensorFlow is given by a computational graph. By using this computational graph, TensorFlow is able to run the computations in parallel on multiple GPUs and CPUs or entire servers. This allows for an especially efficient method of computation for machine learning applications or any project that can be reduced to a graph of computations. The main benefits of this approach is the ability of TensorFlow to automatically compute gradients for optimization algorithms, parallel computation of models and the ability to run a model on multiple devices (TensorFlow can currently can be run on Linux, macOS, Windows, iOS, and Android).

Example of a TensorFlow Computational Graph (source)

To understand TensorFlow, we must begin by understanding the two phases of creating a TensorFlow model: the construction phase, and the execution phase. In the construction phase, the graph is built and in the execution phase, the graph is run. Generally, the execution phase is comprised of running through a loop that iteratively improves the parameters of the model using some form of gradient descent algorithm.

Another key aspect of TensorFlow is that, because the computations are executed as a graph, a session must be run to execute the code and or determine values of the variables in that code. A session initializes the graph and executes it using optimized C++ code, which makes TensorFlow so efficient. All the computations are run outside of python, similar to the way Numpy operations are also computed.

TensorFlow Basics

So, let’s begin by creating a really simple graph, two values added together:

import tensorflow as tf a = tf.placeholder(tf.float32)
b = tf.placeholder(tf.float32)
adding_node = a+b

Here, a placeholder is TensorFlow’s own way of creating something like a variable. By creating a placeholder, you are telling TensorFlow that you will give it a value later for it to execute operations (called ops) on the placeholders. If we were to visualize this graph using TensorBoard, the graph would look a little like this:

Adder Node from TensorBoard

Here, the lines represent the flow of tensors (n-dimensional arrays of numbers). As you can see, by running the code, TensorFlow only creates the graph of computation for generic tensors. Now, let’s go into how to do define tensors within TensorFlow and conduct ops on them.

Suppose we wish to create this array (like the last tutorial) and multiply it, then add to it:

Generic matrix operations

The code would look like this:

import tensorflow as tfWeight_Matrix = tf.constant ([ 1,2,3,4,5,6,7, 2,3,4,5,6,7,8, 3,4,5,6,7,8,9 ],shape=[3,7])Vertical_Vector=tf.constant ([ 1,2,3,4,5,6,7],shape=[7,1])Bias_Vector =tf.constant ([ 1,2,3], shape =[3,1])

When reading this, it is important to notice that every array of numbers in TensorFlow is a tensor. Because they are all tensors, we have to define their shape. A good reference guide to see how ranks of tensors’ shapes are defined can be found here:

Tensor Ranks, Shapes, and Types

In order to produce the computation shown above, we can just run the variables through the a bunch of TensorFlow ops like this:

output= tf.add(tf.matmul(Weight_Matrix,Vertical_Vector), Bias_Vector)

Now, if we ran all this code, then we would simply have a graph of the operations. In order to evaluate the value of “output” we need to run a session. A session is the way that TensorFlow translates the graph to the optimized operations outside of python and computes all the operations simultaneously. Only one session can be run at any given time, so it is important to close the session after it is run, like this:

sess=tf.Session() 
print (sess.run(output))
sess.close()

Equivalently, we can run the session in a with block. A with block automatically exits the session after it is executed:

with tf.Session() as sess:
print(sess.run(output))

And in either case, we can see the answer is:

[[141]
[170]
[199]]

Neural Network Model using TensorFlow

Now that we know a good deal about the basic operations of TensorFlow, we can go out and build our first neural network to classify MNIST digits!

To begin, we need to install a few libraries and the data for the model:

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import datetime as dt

First, we must input the data from the as we did above and encode the training and testing set using the one-hot encoding I talked about in the previous tutorial.

#this is a dataset of numbers between zero and nine that is squashed down into a vector of
#size 1x784
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
#this should create a symbolic vector input for the data set
x = tf.placeholder(tf.float32, [None, 784], name="X")
#None means the dimension can be any length

The original test images are 28 x 28 pixels, so in order to run the image through as a vector, the images are made into a 1 x 784 by slicing each layer of the image. Luckily for us, this was already done for us by the TensorFlow input_data call.

The “x” is a placeholder for the input image vector. The first dimension is initialized as None so that the data can be run in batches during training. I added the name=”x” as a way for us to access the model in a clear fashion later in TensorBoard.

For the model, we will be using a 3 hidden-layer basic neural network. We will denote K, L, M and N as the size of each neuron hidden layer. Here is what that would look like:

K=320
L=110
M=45

As you may remember from a Linear Algebra course, a product of a matrix of M x N and a matrix N x P will produce a matrix of size M x P. Because the inner layers of a Neural Network are fundamentally just matrix operations, as you can see to the right (courtesy of Chris Olah’s amazing blog), we have to remember this property when we construct the hidden layer weight-matrices.

To construct the hidden layers, we will start by creating the matrices and biases. It is important to initialize the weight matrices with random weights (hence, truncated_normal) instead of just zeros, because otherwise the model would not be trainable (because the cost function would be stuck at zero). With this in mind, we will construct the matrices and biases for the layers like so:

with tf.name_scope("Weights"):
W1= tf.Variable(tf.truncated_normal([784, K], stddev=.1), name="W1")
tf.summary.histogram("Weights_1", W1)

W2= tf.Variable(tf.truncated_normal([K, L],.1), name = "W2")
tf.summary.histogram("Weights_2", W2)

W_out= tf.Variable(tf.truncated_normal([L, 10],stddev=.1),name = "W_out")
tf.summary.histogram("Weights_Out", W_out)
with tf.name_scope("Biases"):
b1 = tf.Variable(tf.zeros([K]), name= "b1")
tf.summary.histogram("Biases_1", b1)
b2 = tf.Variable(tf.zeros([L]),name = "b2")
tf.summary.histogram("Biases_2", b2)

b_out= tf.Variable(tf.zeros([10]),name = "b_out")
tf.summary.histogram("Biases_Out", b_out)

In order to be trainable parameters, all the weights and biases are initialized as Variables. It is important to note that all variables must be initialized before execution of the graph, but more on this later. After I initialized the weights and the biases, I added a histogram summary operation so that we can see how the weights and the biases change during training

Now that we have all the parameters defined, we can create the model! I will enclose the model in a name_scope so that it is clear when viewing the graph in TensorBoard.

with tf.name_scope("MultiLayer_NN"):
#implementing multilayer nn
y1 = tf.nn.elu(tf.matmul(x, W1) + b1 , name = "y1")
y2 = tf.nn.elu(tf.matmul(y1, W2) + b2,name = "y2")
Output= tf.nn.softmax(tf.matmul(y2, W_out) + b_out, name = "Output")

As you can see, we are taking the matrix multiplication of the input, the first weight matrix W1 and adding a bias, b1. We then wrap this in a Exponential Linear Unit activation function (the tf.nn.elu function). To add layers, we just take the output of the previous layer as the input. Finally, we apply a softmax to the final layer (similar to the last tutorial) to output probability classes for each of the numbers.

Now, we have to find a way of evaluating the cost of the model. We will use something called Cross Entropy. The TL;DR of the article is that Cross-Entropy essentially shows how different two probability distributions are. Mathematically, it is defined like this:

In this description, q is always the output vector of the network. Its easy to remember this by realizing that the log(0) is not defined and the labels are one-hot encoded (they have a lot of zeros in them), and the values of the output of the network will always be non-negative. The p in this equation- you may have guessed- is the output probabilities of the network.

Now, we can define the output of the network and evaluate the cost:

Output_labels = tf.placeholder(tf.float32, [None, 10], name=”Output_labels”)with tf.name_scope(“Cross-Entropy”):
cross_entropy=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=Output_labels, logits=Output), name= “Cross_Entropy”)
training_rate=0.0005
train_step=tf.train.AdamOptimizer(training_rate).minimize(cross_entropy)

First, I defined what the actual labels of the digits should be with Output_labels. Then, (under the name scope Cross-Entropy), I defined the cost. Next, I defined the training rate (a hyper-parameter you can tune how fast your model changes to new data). Then, I defined what would be a single training step, using the AdamOptimizer (this will be helpful when we go through the training iteration).

Ok, so now I will throw up a lot of code for the training process and I will try my best to explain what I did here

with tf.Session() as sess:
init= tf.global_variables_initializer()
sess.run(init)
epochs = 10000

now = dt.datetime.utcnow().strftime(“%B.%d.%y@%H.%M.%S.%f”)
filewrite_out=tf.summary.FileWriter(“/tmp/MNIST_MultiLayer_ANN/{}”.format(now))
filewrite_out.add_graph(sess.graph)

tf.summary.scalar(“Cost”, cross_entropy)


correct_prediction = tf.equal(tf.argmax(Output, 1), tf.argmax(Output_labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.summary.scalar(“Accuracy”, accuracy)
merged_summaries=tf.summary.merge_all()with tf.name_scope(“Training”):
for i in range(epochs):
#by using small batches of a 100 data points as below, this utilizes stochastic gradient descent
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, Output_labels: batch_ys})
if i%5 ==0:
sum_op=sess.run(merged_summaries, feed_dict={x: mnist.test.images, Output_labels:mnist.test.labels })
filewrite_out.add_summary(sum_op, i)

if i%10 ==0:
correct_prediction = tf.equal(tf.argmax(Output, 1), tf.argmax(Output_labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print (“The accuracy for run “, i, “ in “, epochs, “ is “, sess.run(accuracy, feed_dict={x: mnist.test.images,
Output_labels: mnist.test.labels}))


# Test trained model on dataset
correct_prediction = tf.equal(tf.argmax(Output, 1), tf.argmax(Output_labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(‘Accuracy of trained model: ‘,sess.run(accuracy, feed_dict={x: mnist.test.images,
Output_labels: mnist.test.labels}))

Alright, so to begin, we have to define the session, so we change the values of the variables and execute computations on the graph. Then, we define init as the global_variables_initializer method, then we run it.This is a crucial step to every model in TensorFlow. In order to adapt the variables, they have to first be initialized in the session.

Next, we define another hyper-parameter, epochs, that defines how many times we will run though the training batches.

Next, I define some variables for the file writer in TensorBoard. (More information about this here). Basically, this defines a file writer that can output information about the model as it runs (through summary operations) so that it can be accessed later. I use the datetime library to create a unique directory for every time I run the model. Then to that filewrite_out object, I run the add_graph method and pass the graph of the session to it. Using the file writer, we can use summary operations to get information about the Cost, and we do the same for the Accuracy. First, we check if the max value of the neural network is equal to the max value of the label. Using this information, we can compute the accuracy, by taking the mean of this over the entire testing set (within the for loop).

After I added a few summary operations and merged them in merged_summaries, we run the training iteration. We simply run through a for loop over the number of epochs, then feed the batch data the placeholder values (x and Output_labels). Its that simple!

We occationally add information about the summaries every 5 iterations and output information to the shell every 10 iterations so we can see that the model is working.

So let’s see how the model works. After the code is run, we find that the final model achieved an accuracy of ~97.5, we can analyze the model we made in TensorBoard, by finding the directory (for me, it ended up being C:\tmp\MNIST_ANN_Simple\July.20.17@20.53.21.608113). Now, all we need to do is go to command prompt and pass this command all as one line:

C:\Users\MyAccountName> tensorboard 
--logdir C:\tmp\MNIST_ANN_Simple\July.20.17@20.53.21.608113

You can find the exact path specified by going to the tmp folder under C:\Users\YourUserName\tmp\ and finding the latest file folder added. Then, we got to either localhost:6006 or 0.0.0.0:6006 in your preferred web browser. Once we do all this, we can see that the model looks as we expected:

TensorBoard Visualization

We can also monitor the values of the weights and biases with the histogram summary operation:

Weights and Biases

Likewise, under the scalar tab, we can see how the accuracy and cost changed over time:

Accuracy and Cost over time

One thing you may notice about this graph is the sudden drop at training step ~3000 where the cost and the accuracy change rapidly. One thing this may indicate is that we chose a unstable learning rate with bad convergence properties (the cost and accuracy should decrease\increase at a gradual rate). If you want more information about the TensorBoard package, you can check out this great demo given at the 2017 TensorFlow Dev Summit. You can check out all the code from this tutorial at my GitHub.

Thanks for reading :)

Tim Wroge

Written by

Tim Wroge

Computer Engineering @ University of Pittsburgh | AI is cool

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade