Hyperparameter Optimization for AllenNLP Using Optuna

himkt
Optuna
Published in
6 min readJun 15, 2020

Introduction to Optuna integration for AllenNLP.

TL; DR

  • An AllenNLP integration, AllenNLPExecutor, has been added to Optuna, enabling users to reuse an AllenNLP Jsonnet configuration file to optimize hyperparameters.
  • We will continue to enhance the AllenNLP integration (e.g., support for Optuna Pruning API). This allows users to train a model with some sophisticated algorithms such as Hyperband.
  • The sample I made for the demonstration for this article is available on GitHub. Please check it out!

Introduction

In this article, I introduce how to use Optuna, a hyperparameter optimization library, to estimate hyperparameters of a model implemented with AllenNLP, a neural network library for natural language processing.

Optuna

Optuna is a library for hyperparameter optimization, providing flexibility in optimizing hyperparameters in machine learning. Optuna has many search algorithms for hyperparameters, including Tree-structured Parzen Estimator (TPE) [1], CMA Evolution Strategy (CMA-ES) [2], and Multi-objective optimization [3].

In Optuna, we define an objective function to perform hyperparameter optimization. A simple example is shown below.

An example of Optuna, from https://optuna.readthedocs.io/en/latest/

The search spaces of parameters are defined by using suggest APIs. Passing the objective function to the method study.optimize makes Optuna starts to optimize hyperparameters. For more information, please check the tutorial:

AllenNLP

AllenNLP is the library for natural language processing using neural networks. It is developed by the Allen Institute for Artificial Intelligence. They have presented a paper and a tutorial at top NLP conferences, which are useful for NLP research. In addition, a variety of tutorials and demonstrations are available online, allowing beginners to experience cutting-edge NLP techniques.

There are two ways for implementing the model using AllenNLP: (a) writing a Python script and executing it directly, and (b) preparing a configuration file written in Jsonnet and running by the allennlp command-line interface. This article explains about the latter. If you want to use Optuna in your Python scripts, please see the official sample on GitHub.

Using a Jsonnet configuration file, users can train models by only writing configurations for the experiments. This eliminates the need to write a script to train a model and allows users to focus on their model architecture, hyperparameters, and training configuration.

One of the most famous hyperparameter optimization tools for AllenNLP is AllenTune. AllenTune supports optimization using a Jsonnet-style configuration file. And AllenTune supports simple Random Search and Grid Search algorithms for search parameters. The user can optimize by making a few line changes to the existing Jsonnet file in order to define the search space.

Optuna + AllenNLP

For AllenNLP, typically, the hyperparameters are defined with a Jsonnet file, while Optuna defines hyperparameters to be optimized by writing a Python script. To bridge this gap, we’ve created the AllenNLPExecutor to allow Optuna ranges of hyperparameters to be defined in the AllenNLP Jsonnet file.

The AllenNLPExecutor performs parameter optimization as follows.

  • Edit the configuration file in Jsonnet format and mask the hyperparameters with std.extVar.
  • Sample parameters from the search space defined using Optuna’s suggest API and setting up Jsonnet files to create a Params object for AllenNLP
  • Pass the Params object to allennlp.commands.train.train_model and execute model training.

For details of the implementation, please see the pull request.

Previous, it would be necessary to create a module for each project to do the above. But now, with AllenNLPExecutor, you can optimize hyperparameters with less effort. The PR was merged and has been available since v1.4.0, which was released on May 11.

AllenNLPExecutor Demonstration

Task: IMDb

To demonstrate Optuna’s AllenNLP integration, we tackle the sentiment analysis of the IMDb review data[3]. The IMDb dataset contains 20,000 training data and 5,000 test data, each record containing a review submission for a movie or TV show and a label indicating whether the review was a positive or negative submission. The task, in this case, is to predict whether a review is positive or negative from the textual information in the body of the review.

Preparation

If you create a configuration file in Jsonnet format using AllenNLP, it looks like the following. The configuration file and parameters are based on the official sample of AllenTune. The default value of the parameter is the median value of each parameter space defined in the official sample. We call this model baseline.

First, mask values of hyperparameters in the Jsonnet config with Jsonnet method calling std.extVar('{param_name}') with std.parseInt for integer or std.parseJson for floating-point. [edited in 2020/07/28: please use std.parseInt or std.parseJson for casting parameters to desired value types.]

Resulting config would be the following:

Now that you have created the config, you can define the search space in Optuna. Note that the parameter names are the same as those defined in the config earlier. The objective function is as follows.

Once we have defined the search space, we pass the trial object to AllenNLPExecutor. It’s time to create executor! AllenNLPExecutor takes a trial, a path to config, a path to snapshot, and a target metric to be optimized as input arguments (executor = AllenNLPExecutor(trial, config, snapshot, metric)). Then let’s run executor.run to start optimization. In each trial step in optimization, objective is called and does the following steps: (1) trains a model (2) gets a target metric on validation data (3) returns a target metric.

After all, creating study and study.optimize starts parameter optimization.

Results

The results of the hyperparameter optimization are shown below. The evaluation metric is the percentage of correct answers in the validation data. Baseline is the model that I described in the preparation section. Optuna+AllenNLP is the result of optimization with AllenNLPExecutor. We performed the optimization five times with changing the seed values and calculated the average accuracy. Because the baseline is trained with fixed hyperparameters, the average accuracy remains constant over repeated trials. As a result of the optimization using Optuna, we can see that the average accuracy improves with the number of trials. To the end, the accuracy improved by about 2.7 points on average compared to using the original hyperparameters.

Performance comparison between baseline and AllenNLP+Optuna

Optuna also has the feature to dump a configuration file with optimized hyperparameters. Call dump_best_config with a path to config, a path to output config, and the study already optimized.

The example for the output of dump_best_config looks like the following. You can see that the values of parameters such as dropout and embedding_dim masked with std.extVar are rewritten with the actual values. Also, the output file can be passed directly to the command. This allows the user to relearn the model with optimized parameters.

Conclusion

In this article, I introduced how to combine AllenNLP and Optuna, a neural network library for natural language processing, to optimize hyperparameters, which is easy to use with a few modifications to AllenNLP’s Jsonnet file. As a demo, I worked on a polarity analysis of IMDb review data.

The sample I made for this demo is available on GitHub. If you want to run the sample, please try it!

--

--