[Tensorflow] Core Concepts and Common Confusions

From a beginner’s point of view (Understanding Tensorflow Part 1)

Photo by Luca Bravo on Unsplash

A Little Bit about Myself

  1. I’ve been using Keras since its pre-1.0 era, and have mostly used it to train some standard MLP/CNN/RNN models.
  2. Last July I started to learn PyTorch, and have since been able to work with some more advanced or customized models using PyTorch.
  3. I had worked through the course and exercises when Google’s udacity deep learning course came out, so I can read a Tensorflow project and get a gist of what’s going on. But I’ve never done any real project directly with Tensorflow before. It’s still very foreign to me.

This post will be written from my personal perspective, so the readers should already have some basic idea about what deep learning is, and preferably are familiar with PyTorch.

20180528 Update (Gihub repo with links to all posts and notebooks):

Why Tensorflow

I’ve been considering picking up Tensorflow for a while, and have finally decided to do it. The main appeal of learning Tensorflow includes (compared to using PyTorch exclusively. Keras as a high-level library is not really comparable.):

  1. Tensorflow is still arguably the most used deep learning library by researchers and engineers. It’s impractical to port all interesting research published in Tensorflow to PyTorch by myself. Waiting for others to release their port can also be a risky business. Having a Tensorflow fluency that allows you to base your work on others’ results in Tensorflow should be very beneficial.
  2. Tensorboard is awesome. Granted, there are already Tensorboard integrations for other deep learning libraries. But using it with Tensorflow will be more well-documented and well-supported.
  3. Production readiness. Other libraries are catching up, but Tensorflow has it already all figured out for you(Tensorflow Serving and TensorFlow Lite).
  4. There have some voices asking people to boycott Facebook’s research ecosystem given the recent turmoil at Facebook (the company behind PyTorch). Although I personally still feel conflicted about the idea of boycotting a very fine piece of open source software, learning the other major alternative can do no harm.
Tensorboad Graph Interface

The Tensorflow Programming Stack

Source

Tensorflow started with only the kernel and low-level APIs, but has accumulated a lot of modules and become a behemoth. Official documentation recommends using Estimators and Datasets, but I personally chose to start from Layers APIs and low-level APIs to have the kind of access similar to ones in PyTorch, and work my way up to Estimators and Datasets. There will be more posts coming up on these topics.

I find the complexity of its stack one of Tensorflow’s main turn-offs. The vast of amount of modules to choose from can also overwhelms the beginners. It’ll become a lot easier when you finally find a set of modules you can work with most comfortably.

Graphs and Sessions

This is the one of the most important concept for those who come from PyTorch. Pytorch create a dynamic computational graph on the fly to do automatic differentiation. Tensorflow, on the other hand, requires you to define the graph first. I

For Tensorflow, it’s like building a systems of pipes first(a graph), pumping water into it and receiving the processed water in the other end (session.run). Pytorch allows you to pump water into the system while you are building it, so it’s easier to find any sub-units that malfunctioned (debugging).

Data Flow in a Graph

There are some upsides to static graphs, despite the obvious downsides. Because it is static, Tensorflow can infer some parameters like input sizes for you when compiling the graph. The trained model will also be more portable to other platforms.

(Tensorflow has started to offer their version of dynamic computation graph — Eager Execution. But it is still in pre-alpha stage.)

Usually you only need one graph. Tensorflow implicitly defines a default graph for you, but I prefer to explicitly define it and group all graph definition in a context:

# Graph definition
graph = tf.Graph()
with graph.as_default():
# Operations created in this scope will be added to `graph`.
c = tf.constant("Node in graph")
# Session
with tf.Session(graph=graph) as sess:
print(sess.run(c).decode("utf8"))

Session.run

This method is basically the whole point of creating a session. It starts the data flow in the graph (pumps water into a piping system). The two most important parameters are fetches(outputs) and feeds(inputs).

By passing a list of nodes (can be operations, tensors, or variables) as fetches, you tell Session.run you want the data to flow to the given nodes. Session.run will close off all the subgraphs that are not required to reach those nodes, hence saves execution time.

By passing a map from values to tensors as a dictionary of feeds, you tell Session.run to fill the tensors with the given values when running the graph. This is how you input information to the graph. (You can also use variables instead of tensors in feeds, although it’s not very common.)

In the following example from the official tutorial, y is passed as the sole element of fetches, and values to placeholder x is passed in a dictionary:

# Define a placeholder that expects a vector of three floating-point values,
# and a computation that depends on it.
x = tf.placeholder(tf.float32, shape=[3])
y = tf.square(x)

with tf.Session() as sess:
# Feeding a value changes the result that is returned when you evaluate `y`.
print(sess.run(y, {x: [1.0, 2.0, 3.0]})) # => "[1.0, 4.0, 9.0]"
print(sess.run(y, {x: [0.0, 0.0, 5.0]})) # => "[0.0, 0.0,25.0]"

Tensors v.s. Variables

In PyTorch, a variable is part of the automatic differentiation module and a wrapper around a tensor. It represents a node in the computation graph, and stores its parent in the graph and optionally its gradients. So basically all tensors in the graph are variables.

A variable in Tensorflow is also a wrapper around a tensor, but has a different meaning. A variable contains a tensor that is persistent and changeable across different Session.runs. So they are usually the ones that are updated in back-propagations (e.g. the weights of a model), and also any other states we want to keep between different runs. Also, all variables need to be initialized (usually through an tf.global_variables_initialize operation) before they can be used.

Variables does not persist after a session being closed. You have to remember saving those variables before closing the session. (The official documentation mentioned modifications to variables are visible across multiple sessions, but that seems only apply to concurrent session running on multiple workers.)

To save variables/weights in Tensorflow usually involves serializing all variable into a file. While PyTorch relies on the state_dict method to extract Parameters (a subclass of Variable) and persistent buffers to be serialized.

Saving models in Tensorflow involves defining a Saver in the graph definition and invoking the save method in a session:

saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(init_op)
# Do some work with the model.
inc_v1.op.run()
dec_v2.op.run()
# Save the variables to disk.
save_path = saver.save(sess, "/tmp/model.ckpt")

There’s also a SavedModel class that not only saves variables, but also the graph and the metadata of the graph for you. It is useful when you want to export your model.

Name Scope v.s. Variable Scope

PyTorch names the parameters in a quite Pythonic way:

PyTorch Parameter Naming

Tensorflow, however, uses namespaces to organize tensors/variables and operations. Tensorboard group operations according namespaces they belong to, and generate a nice visual representation of the graph for you:

The Graph of One of My Models

An example of tensor name is scope_outer/scope_inner/tensor_a:0. tensor_a is the actual name of the tensor. The suffix :0 is the endpoint used to give the tensors returned from an operation unique identifiers, i.e. :0 is the first tensor, :1 is the second and so on.

Some medium or high level APIs like tf.layers will handles some of the scope naming for you. But I’ve found it not smart enough sometime, so I had to do it manually. And it brings me to the question — which one to use, tf.name_scope or tf.varialble_scope?

This stackoverflow answer gave a brilliant explanation to the differences between these two, as illustrated in the following graph:

Left: tf.variable_scope Right: tf.name_scope (Source)

Turns out there is only one difference — tf.variable_scope affects tf.get_variable, whiletf.name_scope doesn’t. tf.variable_scope also has a parameterreuse, which allow you to reuse the same variable (with the same name in the same namespace) in different part of the code without having to pass a reference to that variable around.

Here’s an example of mixing tf.variable_scope and tf.name_scope from the official documentation:

with tf.variable_scope("foo"):
with tf.name_scope("bar"):
v = tf.get_variable("v", [1])
x = 1.0 + v
assert v.name == "foo/v:0"
assert x.op.name == "foo/bar/add"

IMO, usually you’d want to use variable_scope unless there is a need to put operations and variables in different levels of namespaces.

Time to Get Our Hands Dirty

That’s pretty much it! In this post we have covered 4 topics that I’ve found most confusing to beginners:

  1. The programming stack
  2. Graphs and sessions
  3. Tensors and variables
  4. Name scope and variable scope

Now we can go on and write some actual Tensorflow code. In the next two or three posts I’ll share some of the exercises I set for myself, using low-to-medium level APIs. After that maybe I’ll do a piece on how to integrate higher level APIs like Estimators and Datasets, so we can comfortably move between different levels of APIs to suit our requirements.

Quick Links