Introduction to AdaNet

AdaNet (Adaptive Structural Learning of Artificial Neural Networks) was announced by Google in its blog as a new AutoML framework. It is developed with Tensorflow and its goal is creating network ensembles to achieve a better performance.

AdaNet

AdaNet uses the ensemble learning concept, that is, the final model is a composed of simpler ones. This makes the model more complex, but also can deliver better accuracy.

In order to generate the ensemble, at each iteration the algorithm checks a set of candidates networks and evaluates which one produces smaller loss, then it is added to the ensemble. Each candidate network architecture must be provided by the user.

AdaNet iterations example. At each iteration a set of candidates networks is generated, the one with smaller loss is added into the ensamble. Source: https://ai.googleblog.com/2018/10/introducing-adanet-fast-and-flexible.html

To avoid ensamble overfitting training data, the algorithm keeps a balance between loss reduction and model generalization ability. For this, we can establish the complexity for each one of the subnetworks and use some AdaNet's hyperparameters to penalty this networks.

After this brief description about AdaNet's behavior, let's make a model using this framework.

AdaNet in practice

In this example we will use CIFAR-10 dataset and our goal is classifying images into one of the 10 categories.

As said before, AdaNet is built with Tensorflow, so our packages dependencies are:

  • tensorflow[-gpu];
  • adanet.

Its implementation has been done with Tensorflow's Estimators API, so our base network (for comparison purposes) will be built using this API.

Dataset

CIFAR-10 dataset may be loaded using Tensorflow Keras function:

import tensorflow as tf
(x_train, labels_train), (x_test, labels_test) =
tf.keras.datasets.cifar10.load_data()

The dataset is composed of 50,000 images with shape (32, 32, 3) for training and 10,000 images for testing.

Let's normalize the images, scaling their values between 0 and 1.

x_train = x_train / 255 # map values between 0 and 1
x_test = x_test / 255 # map values between 0 and 1

x_train = x_train.astype(np.float32) # cast values to float32
x_test = x_test.astype(np.float32) # cast values to float32

labels_train = labels_train.astype(np.int32) # cast values to int32
labels_test = labels_test.astype(np.int32) # cast values to int32

Since we are using Estimators API, the inputs shall be provided as functions. The simpler way to do that is through the function tf.estimator.inputs.numpy_input_fn, but there are other ways to do that.

EPOCHS = 10
BATCH_SIZE = 32
train_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": x_train},
y=labels_train,
batch_size=BATCH_SIZE,
num_epochs=EPOCHS,
shuffle=False)

adanet_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": x_train},
y=labels_train,
batch_size=BATCH_SIZE,
num_epochs=1,
shuffle=False)

test_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": x_test},
y=labels_test,
batch_size=BATCH_SIZE,
num_epochs=1,
shuffle=False)

Now we can define our network architecture.

Base model

Our base model is a CNN (Convolutional Neural Network) with:

  • one convolutional layer with 32 filters, kernel size equal to 7 and ReLU activation;
  • one Max Pooling layer with reduction factor equal to 2;
  • one flatten layer;
  • one fully connected layer with 100 units and ReLU activation;
  • and one fully connected layer with 10 units and Softmax activation.
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten,
Dense
def cnn_model(features, labels, mode, params):
images = list(features.values())[0] # get values from dict

x = tf.keras.layers.Conv2D(32,
kernel_size=7,
activation='relu')(images)
x = tf.keras.layers.MaxPooling2D(strides=2)(x)
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(100, activation='relu')(x)
logits = tf.keras.layers.Dense(10)(x)
...

Let's initialize our Estimator and begin training:

classifier = tf.estimator.Estimator(model_fn=cnn_model)
results, _ = tf.estimator.train_and_evaluate(
classifier,
train_spec=tf.estimator.TrainSpec(
input_fn=train_input_fn,
max_steps=MAX_STEPS),
eval_spec=tf.estimator.EvalSpec(
input_fn=test_input_fn,
steps=None))
print("Accuracy:", results["accuracy"])
print("Loss:", results["loss"])

This training results are an accuracy of ~37.15% in the test set and a loss of ~4.91.

AdaNet Ensemble

Now we are going to make a model with AdaNet, so we need to extend two abstract classes:

  • adanet.subnetwork.Builder;
  • adanet.subnetwork.Generator.

Generator class generates a set of candidates networks to the ensamble using Builder class.

class CNNBuilder(adanet.subnetwork.Builder):
def __init__(self, n_convs):
self._n_convs = n_convs

def build_subnetwork(self,
features,
logits_dimension,
training,
iteration_step,
summary,
previous_ensemble=None):

images = list(features.values())[0]
x = images

for i in range(self._n_convs):
x = Conv2D(32, kernel_size=7, activation='relu')(x)
x = MaxPooling2D(strides=2)(x)

x = Flatten()(x)
x = Dense(100, activation='relu')(x)

logits = Dense(10)(x)

complexity = tf.constant(1)

persisted_tensors = {'n_convs': tf.constant(self._n_convs)}

return adanet.Subnetwork(
last_layer=x,
logits=logits,
complexity=complexity,
persisted_tensors=persisted_tensors)

def build_subnetwork_train_op(self,
subnetwork,
loss,
var_list,
labels,
iteration_step,
summary,
previous_ensemble=None):

optimizer = tf.train.RMSPropOptimizer(learning_rate=0.001,
decay=0.0)
return optimizer.minimize(loss=loss, var_list=var_list)

def build_mixture_weights_train_op(self,
loss,
var_list,
logits,
labels,
iteration_step, summary):
return tf.no_op("mixture_weights_train_op")

@property
def name(self):
return f'cnn_{self._n_convs}'

The method used to create subnetworks has a parameter called complexity that is passed to adanet.Subnetwork(). This parameter is used by the algorithm to keep the balance between network's complexity and loss reduction.

The builder class has a method to produce weights for each subnetwork in the ensemble, so the final result may have more influence from one specific network. Here we set same weight to every subnet.

Lastly, the property name represents subnetwork’s name. In our code it is composed of "cnn_"+convolutions.

The code below implements Generator class:

class CNNGenerator(adanet.subnetwork.Generator):
def __init__(self):
self._cnn_builder_fn = CNNBuilder
def generate_candidates(self,
previous_ensemble,
iteration_number,
previous_ensemble_reports,
all_reports):

n_convs = 0
if previous_ensemble:
n_convs = tf.contrib.util.constant_value(
previous_ensemble.weighted_subnetworks[-1]
.subnetwork
.persisted_tensors['n_convs'])
return [
self._cnn_builder_fn(n_convs=n_convs),
self._cnn_builder_fn(n_convs=n_convs + 1)
]

The method generate_candidates returns a set of candidates networks at each iteration. In our code, this set has two networks with different convolutions count. During the first iteration (0), we have a special case where n_conv = 0, so there are no convolutions.

In this example, we use 3 AdaNet iterations, this means that AdaNet will evaluate 3 sets with 2 subnetworks each.

head = tf.contrib.estimator.multi_class_head(10)
estimator = adanet.Estimator(
head=head,
subnetwork_generator=CNNGenerator(),
max_iteration_steps=max_iteration_steps,
evaluator=adanet.Evaluator(
input_fn=adanet_input_fn,
steps=None),
adanet_loss_decay=.99)

The head variable represents Softmax activation over the ensamble output.

After training this model our results showed an accuracy of about 41.56% and a loss of 1.79.

results, _ = tf.estimator.train_and_evaluate(
estimator,
train_spec=tf.estimator.TrainSpec(
input_fn=train_input_fn,
max_steps=MAX_STEPS),
eval_spec=tf.estimator.EvalSpec(
input_fn=test_input_fn,
steps=None))
print("Accuracy:", results["accuracy"])
print("Loss:", results["average_loss"])
print(ensemble_architecture(results))

The final model is an ensemble of:

  • one network without convolutions;
  • two networks with one convolution each.

Conclusion

In this article, we have made a model using AdaNet framework. This model was able to improve the performance when compared with a simple CNN architecture. This framework is a good solution to look for non trivial models with less networks architecture knowledge .

A drawback of this solution is the interface with only Estimators API. This API is less intuitive than Keras, for example. But, since AdaNet is a new project, we may expect great development in its functionalities and interface.