Combining multiple TensorFlow Hub modules into one ensemble network with AdaNet

Posted by Sara Robinson

Have you ever started building an ML model, only to realize you’re not sure which model architecture will yield the best results? Enter the TensorFlow-based AdaNet framework. With AdaNet, you can feed multiple models into AdaNet’s algorithm and it’ll find the optimal combination of all of them as part of the training process. I’ve been playing with it recently and have been particularly impressed with the accuracy of an ensemble compared to individual models.

Hold up. Before we go any further: how does AdaNet fit into the growing ML space? It is an open source implementation of the AdaNet paper. This paper outlines a concept called Neural Architecture search, which involves automating the process of designing the optimal ML model architecture for a particular task. It comes with theoretically backed performance guarantees, and it is fast to run.

How does this relate to AutoML? AutoML involves, but is not limited to, data preprocessing, feature engineering, model family search and hyperparameter tuning. You can think of AutoML as an umbrella concept, and AdaNet falls under the model architecture-search facet of AutoML. Also note that the AutoML research is different from Google Cloud AutoML, which uses AutoML concepts under the hood, and is aimed at developers who want to build custom ML models without writing model code (I’ve written lots of blog posts on Cloud AutoML).

In this post I’ll walk you through building an ensemble network using AdaNet’s AutoEnsembleEstimator. You can build any type of network with AdaNet (images, text, structured data, etc.). For this example, I’ll build a text classification model to predict the author given a few sentences of text they’ve written. In addition to AdaNet, here are the tools we’ll be using to build this model:

I’ll also show you how to train the model at scale using Cloud ML Engine. In honor of new works entering the public domain for the first time in 20 years, I’ve chosen some authors who had books published in 1923 to use as training data. To jump to the model code for this, check it out on GitHub.

This example uses AdaNet 0.5.0, TensorFlow 1.12.0, and TF Hub 0.2.0.

Here are the packages we’ll be importing for our model:

Downloading data

I’ve downloaded the literature data from Project Gutenberg, and done some preprocessing to split each text into many 1–2 sentence segments tagged with their particular author. Here’s a preview:

With the following code, we’ll download the CSV using urllib, convert it to a Pandas DataFrame, shuffle the data, and preview it:

Then we’ll split it into train and test sets, using 80% of the data for training:

The labels are strings for each author, and I’ve encoded them as one-hot vectors using the Scikit Learn LabelEncoder utility. We can do this in just a few lines of code:

Embedding columns with TF Hub

My favorite part about TF Hub text modules is that you can instantiate an embedding feature column with one line of code, and you don’t need to do any preprocessing on your text inputs to turn it into an embedding. TF Hub will take care of the heavy lifting, so you can feed raw text directly into your model without any preprocessing. I’ve written more on TF Hub in this post, so I won’t go into too much detail on it here. Suffice it to say that building word embeddings from scratch would take lots of time and training data, and TF Hub gives us plenty of choices of models to start with to make this easier.

There are many different TF Hub text embedding modules we could start with. (TF Hub also provides modules for images and video.) Which one should we choose? It’s hard to say which one will lead to the highest accuracy on our text, and that’s where AdaNet comes in handy. We can build multiple TF Estimators using different TF Hub modules — then we’ll feed them both into the same AdaNet model and let AdaNet ensemble them to find the optimal model.

First, let’s define our TF Hub embedding columns. We’ll use these when we set up our models to tell TensorFlow the format of data it should expect for our features:

Now we can define both Estimators that we’ll feed into our AdaNet model. Since this is a classification problem, we’ll use a DNNEstimator for both:

What’s happening here? hidden_units tells TensorFlow the number of neurons our network will have in each layer. For each of these, it’ll have 64 in the first layer and 10 in the second. feature_columns is a list of the features for our model. In this example we have only one (the sentence of the book).

Building our AdaNet model

Now that we’ve got two Estimators, we’re ready to feed them into an AdaNet model. For this example I’m using AdaNet’s AutoEnsembleEstimator which makes this pretty simple. It will take both estimators I’ve created, and incrementally create an ensemble by averaging the predictions of each model. For more customization, check out the adanet.subnetwork Builder and Generator classes. With AutoEnsembleEstimator, we can feed both of the models we’ve defined above into the ensemble in the candidate_pool param:

There’s a lot going on there, let’s break it down:

  • The head is an instance of tf.contrib.estimator.Head, and it tells our model how to compute loss and evaluation metrics for each possible ensemble. AdaNet calls these potential ensemble networks “candidates”. There are many different types of heads (for regression, multi-class classification, etc.). Here we’re using the multi_class_head since there are more than 2 possible label classes in our model. For a model assigning multiple labels to one particular input, we’d use multi_label_head.
  • config sets up some parameters for running our training job: how often we want to save model summaries and checkpoints, and the directory where TF should save them to. Keep in mind that if you’re training a model in Colab, saving checkpoints too frequently could eat up your available disk space.
  • max_iteration_steps tells AdaNet how many training steps to perform in a single iteration. An iteration refers to training for a group of candidates, so this number (along with total training steps which we’ll define later) tells AdaNet how often to generate new ensemble candidates.

We’ll use TensorFlow’s handy train_and_evaluate function for this, which will run training and evaluation at the same time. In order to set this up, we need to write our training and evaluation input functions. Input functions handle feeding the data into our model. We’ll use the tf.data API in our input functions. Even though we have two separate models with different feature columns, we can put both features in the same dict so we only need to write one input function:

Our evaluation features and input function look very similar:

We’re getting close now! The last thing we need to do before training is create or train and eval specs. You can think of this as wiring everything together — since we’re running training and evaluation in one go, these specs will tell our estimator which input function to run for each job:

Remember when we defined max_iteration_steps above? The max_steps parameter in our TrainSpec refers to the total number of steps to train for. This means we’ll have 8 iterations total (8 groups of ensemble candidates).

Now it’s time to run training and evaluation:

Training our AdaNet model on Cloud ML Engine

If you try to run the above cells in Colab, you might hit a memory limit. That’s where the cloud comes in very handy. We’ll use Cloud ML Engine to train our model. To do this, you’ll need to create a GCP project and enable billing. To get the model we’ve defined above ready for the cloud, all we need to do is package up our application locally in the following format:

You can call the trainer directory anything you like — this is the name of the Python package we’ll be uploading to ML Engine with our model. __init__.py is an empty file, and model.py contains all of the code above. setup.py contains the name and version of our package, along with any Python package dependencies we’re using to create the model.

config.yaml is where you specify any Cloud-specific parameters for training. These are things like whether you’ll make use of GPUs or TPUs for training, and how many workers you’ll need for your training job. All of the configuration options are listed here.

Exporting your model for serving

Before I kick off the training job, I want to mention that you can add some code to the model.py file mentioned above to export your model to Cloud Storage when it’s done training. If you don’t care about this right now, you can skip to the next section.

We’ll export our model using the LatestExporter class. To create an exporter, we’ll need to define a serving input function. This confused me at first, but it’s not too different from the other input functions we defined. It should return two things: the format of inputs our model should expect when it’s served, and the format of inputs the server should expect. In our model these are the same, but in some cases you may want to do some preprocessing on inputs before they’re fed into the model. Because ours are the same, the serving input function is pretty straightforward:

If the TF Hub modules we were using didn’t let us pass raw text directly and instead required that text be converted to integers, our input function would return two different objects here.

With that function ready to go, we can define our exporter:

To call export(), we’ll also need our model’s last checkpoint and the eval results from that checkpoint. We can get those with the following:

Woohoo! When this runs in ML Engine, it’ll save our final model.

Starting a training job with gcloud

To train our model in the cloud, we’ll create a Cloud Storage bucket. This is where the checkpoints for your model will be stored. We’ll also point Tensorboard to this bucket, so we can see metrics for our model as it’s training. My favorite way to kick off an ML Engine training job is through the gcloud CLI. First, define some environment variables for the job:

Replace the strings above with the variables specific to your project. Then you’re ready to train with the following gcloud command:

If this executes correctly, you should see a message in the console that your job is queued. You can stream your logs from the command line, or navigate to the Jobs tab in ML Engine on your Cloud console:

Visualizing AdaNet training in TensorBoard

Your training job is running, now what? Luckily you don’t need to wait for it to complete before evaluating the results. You can make use of TensorBoard, which uses the checkpoint files created from your training job to visualize accuracy, loss, and other metrics as training is running. If you’ve got TensorFlow installed locally, good news — you already have access to TensorBoard via the command line.

Run the following command to point TensorBoard to your log directory on Cloud Storage:

Then point your browser to localhost:6006 to view training progress, and navigate to the scalars tab:

Confessions: I had avoided using TensorBoard until now (so many graphs can be intimidating!). But as you’ll soon see, TensorBoard makes it much easier to understand how your model is performing and it’s especially useful for AdaNet. We’ll focus only on the accuracy and adanet_loss graphs here. Let’s start with accuracy, looking at the adanet_weighted_ensemble graph:

Remember that our model has 5000 steps per iteration, meaning every 5000 steps AdaNet will generate new candidate ensembles (with the exception of the first iteration, which includes only the individual networks). If you hover over the graph you can see which iteration and ensemble each line refers to:

We can see that at this point in training, the second ensemble from iteration 7 (t6_DNNEstimator1/eval) has the best accuracy. TensorBoard really shows us the power of combining models with AdaNet — as training continues, ensemble accuracy improves and is much higher than the accuracy of the individual networks on their own (the pink and light blue lines on the left in the graph above).

The loss (or error) graph reveals similar trends: error steadily decreases as AdaNet generates and trains new ensembles.

Using your exported model

If you followed the steps above to create an input serving function and export your model, you should see it in your specified GCS bucket after training completes. Under the hood, AdaNet will export the candidate with the lowest loss (error) for the given iteration. Within the export folder, you should see these files:

If you’d like to serve your model on ML Engine (I’ll cover that in a follow-up post), you can point ML Engine to this bucket following the deploy steps here. You can also download these files locally and serve the model however you’d like.

Because it would be sad to leave you hanging without doing any predictions on our trained model, let’s make use of ML Engine’s local predict to make a local prediction on our trained model from the command line. All we need to do is create a newline-delimited JSON file with an input we want a prediction for, following the same format as our serving input function. Here’s an example:

And then we can run the following command:

This is the response:

This means our model has predicted there’s an 83% chance this was written by the author corresponding with the first index on our label array (we can get this by logging encoder.classes_ above), which is Churchill. That’s correct!

What’s next?

Now you know how to build a model with AdaNet’s AutoEnsembleEstimator and train it on Cloud ML Engine. Want to learn more about what I covered here? Check out these resources:

One more thing: Are you a fan of Keras? The AdaNet team is currently working on adding Keras support! You can track the progress here.