Experience on Implementing a State of the Art Model Using TensorFlow 2.0
In this post, Carlos Timoteo (my co-author on this post) and I share our experience on implementing a Deep Learning model using TensorFlow 2.0. We’re going to describe on some level of detail our experience.
TensorFlow (TF) is an open-source machine learning framework for research and production use cases. The TF project is maintained by several companies in the industry, including Google Alphabet. Being the second largest open-source project on GitHub by numbers of contributors, providing an easy to read documentation and several tutorials and guides.
We decided to adopt TensorFlow because of its huge ecosystem to help implement a real solution to a research problem. The TF presents a higher level abstraction to build and train deep learning models by using Keras API. TF.Keras is an implementation of the Keras API optimized within TensorFlow. Keras API is based on three key principles: user-friendly; modular and composable; and easy to extend. More details on TensorFlow and TF.Keras can be found online.
Besides, TensorFlow enables researchers to build and train state-of-the-art models even with scarce resources. It provides you to use Keras Functional API and Model Subclassing API to create complex topologies. If you need easy prototype and fast debug, use eager execution. TensorFlow also supports an ecosystem of powerful add-on libraries and models to experiment with, including TFHub, Ragged Tensors, TensorFlow Probability, Tensor2Tensor and BERT. Here is a guide on how you can effectively use TensorFlow 2.0.
Arthur’s Journey on Machine Learning
Well, before working on TensorFlow, my journey in neural networks started with Keras. At that time, I was in the early days on learning machine learning and others researchers recommended me to use Keras, by stating that it was TensorFlow 1.x on the easy way. So why not? I decided to go with Keras. The curious thing is that even without knowing anything in the area and reading a lot of articles and tutorials, I could implement a nice research application project, check the repo here. It was really easy and practical to work with Keras.
Why start talking about Keras? Wouldn’t it be TensorFlow 2.0?
And you are right! I just would like to give some context before having first contact with TF 1.x and TF 2.0. I’ve been introduced to TensorFlow 1.x and 2.0 by Carlos Timoteo, when we’ve worked together on January 2019. In that opportunity, we’ve organized a Google Cloud Study Jam on Machine Learning in Recife. Well, I ended up choosing TF 2.0 (still beta release) as the framework for my dissertation project, in February 2019, for a few reasons. First, what I found was the same Keras framework, I had zero down-time for adaptation. Second, it was cleaned, improved and maintained by TensorFlow team, keeping the same easy logic to build ML solutions. Last, I’m not an old school researcher, any tool that gives me ease, usability and good performance, I would just adopt it (Figure 1). Wonderful, isn’t it?
Another quick note, with TF 2.0 Beta, I had no critical bugs, but of course, you can find a few bugs. If you find any, please file an issue to the maintenance team on the GitHub repository (this is very important for our community).
Building a model using TF 2.0
If you’re used to build models using TF 1.x, dealing with sessions and operations… just forget and embrace the pythonic way of programming. In the new version, we have a much easier to understand environment (especially for beginners). So we now have two high-level APIs, Estimator API and Keras. Those are the options to build a model:
- Using Estimator API, for when you need an out-of-box model, such as a Linear Classifier or Regressor model. Even though they are generic models, it’s a really good solution for fast prototyping and to establish baseline performance;
- Using Keras Sequential class, in which we sequentially stack layer by layer. Imagine this as a linear stack of layers. The data input flows linearly layer by layer, in which layers infers input shape, except the first. This is the easiest way to build simple models, but offers less flexibility;
- Using Keras Model class, in which it allows greater flexibility in the construction of layers in a functional way. Here we can connect non-sequential layers, such as a DNN called U-net. This post exemplifies how to code a U-net using TensorFlow 2.0.
In the last two options, at the end of construction you must compile the model which means configuring learning process parameters like optimizer, loss function and evaluation metrics. Very easy, isn’t it? In addition, TF 2.0 still maintains the `tf.nn` API if you want to get into lower levels of hardcore style code. In this blog post, you can find a better discussion on all options.
In any case, it’s recommended to use `tf.keras` as a development option as it combines great flexibility in creating any kind of model by combining functions. That way, TF 2.0 automatically executes and manages all low-level content.
Another feature provided by TF 2.0 is the options to dynamically build computational graphs using Eager Execution. You can use Eager Execution to easily test model building and model training. In this tutorial you can find examples on how to use Eager Execution. Pytorch early adopters have not a reason anymore to convince you to migrate.
For those who are in love with Transfer Learning, TensorFlow Hub is a library for publication, discovery, and consumption of reusable pre-trained machine learning models and layers. Here you find a tutorial on how to use TF Hub with TF 2.0 Keras API to perform transfer learning.
If you are a researcher and still are not convinced on why you should use TensorFlow for research purposes check TensorFlow Research Cloud (TFRC) and TensorFlow Models. TFRC program enables researchers to apply for access to a cluster of more than 1,000 Cloud TPUs. TensorFlow Model repository contains machine learning models implemented by researchers, including Google Brain and Deepmind researchers.
Feed data to training routine
There are two ways to train a model built using Keras Model class. By calling `fit()` method, in which we train for a fixed number of epochs; or by calling `fit_generator()` method, in which we use python generator or Dataset API (`tf.data.Dataset`). The model trains on data yield batch-by-batch, all like Keras! Oh, by the way, TF Keras!
In Arthur’s project, there was a lot of image data to load, so the `fit_generator()` was used, because that way the memory with all the data was not exceeded. However, what we find fascinating here is the possibility to configure all training through parameters and callback functions. Split training and validation data. Define Learning rate Schedules, Early Stopping, Warms Restarts, Tensorboard, Model Checkpoint and many other options.
We could also use `ImageDataGenerator()` a generator class that allows a variation of input data (images), creating a type of real-time training data augmentation with no memory cost. Would be great, but not for now.
As mentioned before, another approach is to use Dataset API. It’s important to mention that Tensorflow team introduced this API based on the performance analysis. It utilizes multithreading in C++ level. So it would be really nice to work with it, because it’s easier to ingest and manipulate data from different sources than using QueueRunner. It also enables you to build complex data feeding pipelines.
Now, we introduce a limitation that we ended up facing, or would it be a lack of performance? Anyway.. using `ImageDataGenerator()` in a model training, we depend on scikit-image package for image transformations (data augmentation function). The problem found was that the training was taking longer than expected. Therefore, we decided to create a little transform function using OpenCV (following the same logic as TF Keras, but simplified). Well, by implementing it by hand, the training now takes 20 seconds less per epoch. It looks like not too much, but for a 1,000 or 10,000 training epochs with the same amount of data, it’s a difference of several hours already.
Not convinced by all the benefits, Arthur have decided to suffer a little bit working with TF 1.x, dry-running and analyzing HTR implementations, and it was painful (oh God). HTR models are quite complex because they have four inputs and two outputs (adding the Connectionist Temporal Classification). To train such an optical model using TF 1.x takes too long and requires manual implementation of training routine. In the Figure 3, we show how to implement manually a training routine in which you feed data to model batch-by-batch, update the weights of the neural network, check if the model has learned or not, and implements early stop.
Using TF 2.0, you can create the same training routine for the same optical model for an HTR system by calling `fit_generator` method informing appropriate callbacks (Figure 4), and of course, extending the Model class from TensorFlow 2.0.
Training on scale using TF 2.0
TensorFlow Distributed Strategy is an API to distribute synchronously or asynchronously training across multiple GPUs, CPUs or TPUs. Using this API, you can scale existing models and training routines implemented using TF 2.0 Keras or Estimator API with minimal code changes.
To be honest, having this capability was fantastic! We could perform scalable training using Google Colab, a platform that provides a free cloud service based on Jupyter Notebooks that provides free GPU and TPU. Thus, by using the `fit_generator()` method and GPU hardware on Colab, we could train a large amount of data without worrying about hardware allocation (like we would do using TF 1.x). On Figure 5, you can check training metrics on Tensorboard.
There, Arthur’s project, CPU was used for data ingestion and preprocessing, such as train/validation/test splitting, and GPU for model training. Thus, it was possible to optimize the use of resources during the training time.
What about TPU? Worth it? Well, we haven’t used it. TPU is an AI application-specific hardware accelerator developed by Google for machine learning use cases. To give an idea, we can say that the CPU can handle tens of operation per cycle; whether the GPU can handle tens of thousands of operations per cycle. TPUs can handle up a hundred thousand operations per cycle. The Figure 6 brings a perspective when comparing those hardware acceleration technologies.
A final tip for those who will venture into ML frameworks. Check the API documentation on the website!
We can find the descriptions, parameters of the methods, tutorials, references to the original technical papers from authors who proposed the methods. Besides, for every parameter there are brief suggestions on their use and in which context it should be applied. It really helps a lot if you don’t know how to apply things.
For instance, the BatchNormalization class, we can find some interesting information in the docs:
- Short Description: Base class of Batch normalization layer (Ioffe and Szegedy, 2014);
- Research Paper: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift;
- Parameter fused: if True, use a faster, fused implementation, or raise a ValueError if the fused implementation cannot be used. If None, use the faster implementation if possible. If False, do not used the fused implementation;
- Parameter renorm: whether to use Batch Renormalization. This adds extra variables during training. The inference is the same for either value of this parameter.
So, have you ever heard about fused and renorm implementation? Did you know that Batch Renormalization is best used for mini-batch? Or only by using the fused parameter (True), training becomes faster, because it uses optimized implementation?
Even though TensorFlow 2.0 is in Beta, we didn’t see any major problems using it. On the contrary, it made it possible to develop a project in the simplest and most effective way for me. TF 2.0 still have much to offer regarding TPU support, bug fixes, as well as some documentation updates.
We hope this post can help those who are just starting in this area of study and describe some of our experience with the framework. Certainly, we’ll use TPUs in future projects, but GPUs are already sufficient for most studies.