Building an Iris classifier with eager execution
(Reposted from the TensorFlow blog)
One of the new additions to TensorFlow in the last months has been the eager execution, an additional low-level interface promising to make development a lot simpler and easier to debug. Whenever eager execution is enabled, operations are executed immediately, instead of having to go through a separate execution step. In a nutshell, this means that writing TF code can be (potentially) as simple as writing pure NumPy code!
Being a long-requested feature, I was looking forward to test it. After playing around for a while with the new interface, I can say the new workflow feels much simpler and intuitive, all the methods are integrated almost perfectly with the rest of TF and, despite being in an experimental stage, most of the functionalities are already stably implemented.
At the same time, eager execution is a further addition in a framework that is already complex for many newcomers. Which is the reason for this post: a guided tutorial for navigating into this new interface by showing all the steps needed to build a simple classification model from scratch on the Iris dataset. At the end of this article, you will have implemented a fully working model in eager execution from which you can continue experimenting.
This post is also intended as a self-contained introduction to the eager execution: if you are new to low-level programming in TensorFlow, you will learn how you can implement your own algorithms and run them using this new interface. But if you already consider yourself an expert in TF, this post will also highlight the differences when enabling eager execution as opposed to the standard graph construction.
Everybody is ready? Let’s go!
Note: if you want to experiment with eager execution, I also wrote a Jupyter notebook to serve as an extended companion to this post. And if you don’t have a working TensorFlow installation? Remember you can run the notebook using the (completely free) Google Colaboratory service in the cloud!
Why do we need eager execution?
Nowadays TF includes several high-level APIs, such as Estimators and Keras. But many times, you’d also like to explore and run some low-level code, or dig deeper in the framework. At that point, you may like to write or run something simple to begin with.
To see how the eager execution works, let’s start by adding and multiplying some numbers, and see how the code would look like without eager execution:
If you (like me) have been here since the early days of Theano, it is hard to remember just how strange this code feels if you are not used to it: why do you even need an external object just to print a number? And why can’t you find the value of b anywhere? The reason is that, internally, TensorFlow is building a computational graph describing all the operations, which is then compiled and optimized behind the scenes — optimization that, in the initial development of TF, was supposed to happen only after the full specification of the model.
Now compare the same example, but after enabling the eager execution:
This is pretty much NumPy code! In programming terms, the default API for TF is declarative: execution only happens when we request the output of a specific node, and it only returns that particular result. The new eager execution, instead, is imperative: execution follows definition immediately, with all of its benefits, immediate visual debugging for one. Let’s see what this means for the optimization of our models — but first, let’s take a small detour to see how you can install and enable eager execution.
Enabling eager execution in your code
Eager execution has been included as an experimental feature since TF v1.5, and for the purposes of this post I will use the latest v1.7rc0 release (mentioning where you might need to change the code for backward compatibility).
If you want to play with the latest version, the recommended way is to set up a separate virtual environment (you can find detailed instructions on the notebook and on the official guide) and install the nightly build of TF from the console:
pip install tf-nightly[-gpu]
Eager execution can be enabled with a single line of code:
If you are working with v1.5 or v1.6, change
tf.enable_eager_execution()
withtfe.enable_eager_execution()
.
Importantly, you need to run the command at the beginning of the program, otherwise the command will throw an error and refuse to execute. The reason is that, by enabling eager execution, you are changing the default inner mechanisms for TF described before, meaning that several low-level commands will be available only in one of the two modalities (graph construction or eager execution) as described in the documentation.
What we will do in this tutorial
In the rest of this tutorial, we are going to build a classification model (with eager execution enabled) for the Iris dataset, a simple classification dataset which you should already have encountered when following the starting guide for TF. The task is to classify a set of Irises in three different classes, depending on four numerical features describing their geometrical shape:
For the purpose of this article I am simply loading the dataset’s version from the scikit-learn library, normalizing its inputs, and splitting it in two for training and testing. If you are curious, here is the full code for loading the dataset:
We will proceed in five steps. The first part is just to get some familiarity with the basic eager objects such as variables and functions, and how they interoperate with NumPy arrays. Then, we will learn how to build the classification model (part 2), optimize it (part 3), load the data in TF (part 4), and debug the overall process (part 5).
Building the classifier (1/5): functions, gradients, and variables
The basic operation provided by eager execution is automatic differentiation: given some function operating on tensors, we can compute the gradient with respect to one or more of these tensors automatically (and, it goes without saying, efficiently).
If you are costructing a TF graph, functions and gradients are defined as nodes inside the computational graph. With the eager execution enabled, instead, both of them are standard Python functions. For example, here is some code defining a simple squaring function and computing its gradients:
Instead of having to build the graph, all operations are internally saved inside a “tape”, transparently to the user. The tfe.gradients_function
is where the magic happens: you can pass any Python function (having as input at least one tensor) as argument, and it will return another Python function for computing the gradient with respect to any of its arguments.
You might have noticed that in line 6 above, we use
tf.constant
to build a constant tensor: this step is optional, since passing a NumPy array or a float value would trigger an automatic casting operation.
You can also chain the calls to get higher-order derivatives:
A fundamental building block of TF are variables: stateful arrays of numbers that compose our models and that we need to optimize in the training phase. Eager has a custom implementation of variables:
As you can see, variables in eager are fully interoperable with NumPy arrays, and their value is automatically initialized as soon as they are requested (while with graph construction you would need to explicitly call an initialization routine). Using tfe.implicit_gradients
, you can also automatically compute the gradients of a function with respect to all variables it uses. For example, this is equivalent to the gradient computation above, but using variables and implicit gradients:
Now that we have all the building blocks in hand, we can move to a higher level of abstraction, starting with the definition of our neural network.
Building the classifier (2/5): layers and networks
In theory, we already have everything we need for optimizing a model: we can instantiate a bunch of variables (the parameters of our models), and encapsulate the logic of the model itself inside a function. For example, this is a one-dimensional logistic regression model:
However, as soon as we start adding more operations (e.g., new hidden layers, dropout, …) this approach does not scale. A simple way to build more complicated models is to use layers, which should be quite familiar to anyone using the high-level interface of TF:
In the above example, we are building a model with a single hidden layer and a dropout operation in the middle: the definition of variables is taken care of inside each layer, and we can define our model by chaining together the layers inside a simple (once more, Python) function. The model is quite simple to interpret, even if you are not used to TF, and we can implement the dropout logic by just passing another Boolean flag to the function, adding to the readability.
The
tf.layers
module is an implementation oftf.keras.layers
, which contains a set of high-level layers defined inside the Keras library. The implementation at this moment is only partial, with new layers being added continuously: in the meantime, you can also use directly the layers defined intf.keras.layers
.
Since building networks in this fashion is a common workflow, TF also provides a further level of abstraction with the Model
object, a custom class that encapsulates the model’s logic, providing a few advanced tools for debugging and checkpointing the training:
If you are working with v1.5 or v1.6, you need to use
tfe.Network
instead oftf.keras.Model
, and you need to explicitly callself.track_layer()
for each layer added in the initialization.
A major benefit of networks is that they encapsulate all the logic concerning the variables of the model. To understand it, let’s play around with our model a little bit:
Line 2 is just the initialization of the model. The fifth line shows the lazy nature of variables inside a model: even though eager execution has a dynamic workflow, variables are only initialized when a model is run for the first time, because prior to that eager has no indication on the shape of its input tensors. Predictions can be obtained by simply calling the object with some data (line 8): after this, variables are correctly initialized, as shown in line 11.
Building the classifier (3/5): losses and optimizers
Since we are going to use our model to do some classification, we need to define a proper cost function to train it. Similarly to before, we can use the standard components of TF, by encapsulating them in a Python function:
As a quick sidenote, we are using
softmax_cross_entropy_with_logits_v2
, as the oldsoftmax_cross_entropy_with_logits
will get deprecated soon! The two are entirely identical here, the difference being that the new version allows for gradient computation with respect to the labels (not needed for this simple example). By removing the one-hot encoding, we can also usesparse_softmax_cross_entropy
equivalently.
For the optimization, we can define our custom logic or (more probably) use one of the many optimizers already defined inside TF. At this point, you will probably guess what our code will look like:
The only syntactic sugar here is using an anonymous function with no arguments to call the minimization routine: this is needed because the optimizers were designed to work for the graph construction, and have been customized to also support the new eager interface. Another possibility is to explicitly compute the gradients of our model, and use the apply_gradients
function of the optimizer: if you are interested I describe this alternative way in the notebook.
Building the classifier (4/5): loading data and iterators
If you have kept up-to-date with TF, you will know that the preferred way to feed data to our models is though the high-level Dataset API, which can be used to load data from multiple sources and connects with all the high-level APIs. A nice feature of eager execution is that you can encapsulate datasets inside a Pythonesque iterator, and use the iterator to cycle over it as you would any other iterator.
Let’s loop over the training portion of the Iris dataset we loaded before using an iterator:
In the above code we are loading the data inside the Dataset
object (line 2), and then using the iterator to loop over mini-batches of 32 elements, computing the average number of positive labels for each batch, e.g.:
Percentage of class [0]: 31.25 %
Percentage of class [0]: 34.375 %
Percentage of class [0]: 37.5 %
Percentage of class [0]: 25.0 %
Building the classifier (5/5): Debugging tools
The last thing we need for our task are some tools for debugging and visualizing the results. Because in the standard graph construction the execution is masked underneath the session’s dynamics, debugging has always been a difficult point for newcomers to TF, and the main reason why the TensorBoard was developed. In eager execution, however, everything runs dynamically and you can use any tool in your Python’s toolbox for it: from Matplotlib’s plots to the console’s outputs, you name it.
Eager has also a few utilities to simplify the debugging. For example, metrics can be used to compute and accumulate values across each epoch:
The code above loops through a single epoch, and for each batch it computes the accuracy of the model. As we have not trained it yet, the accuracy will probably hoover around 33%, or random chance.
And if you are fond of the good ol’ TensorBoard, you can save values for visualization on the dashboard using the experimental implementation of the summaries:
The syntax here is slightly more involved: after creating a writer, we need to set it as the default one and choose an interval for saving the values on disks (based on the global step of TF, which will increase every-time we make an optimization step).
This was only a cursory look at some debugging tools with the eager execution enabled: check out the notebook for a description of other techniques for visualization and debugging.
Putting it all together
We are finally ready to put it all together! We are going to train our network for a few epochs, using a metric to compute the accuracy and store it in a NumPy array, and a summary to keep track of the loss by saving it on the disk. At the end we use Matplotlib to plot the accuracy across all epochs.
The network easily reaches 100% accuracy, this being a simple benchmark:
And if you open up the TensorBoard, you can check the loss’ progress:
You have reached the end of my tutorial on the eager execution. By now, I hope you are convinced that using eager allows for extremely fast prototyping while making the debugging much simpler. Once again, you can check the Jupyter notebook if you want more details on all the methods we used in this tutorial, which was only intended as a concise introduction to the interface.
There are many things we have not seen here, such as how to enable GPU support, or how to use control flow operations inside your model. Nonetheless, I hope this is enough boilerplate code to kick-start you on your journey to the low-level programming beauty of TF!
Originally published at medium.com on April 13, 2018.