Hyperparameter Optimization for AllenNLP Using Optuna
Introduction to Optuna integration for AllenNLP.
- An AllenNLP integration,
AllenNLPExecutor, has been added to Optuna, enabling users to reuse an AllenNLP Jsonnet configuration file to optimize hyperparameters.
- We will continue to enhance the AllenNLP integration (e.g., support for Optuna Pruning API). This allows users to train a model with some sophisticated algorithms such as Hyperband.
- The sample I made for the demonstration for this article is available on GitHub. Please check it out!
In this article, I introduce how to use Optuna, a hyperparameter optimization library, to estimate hyperparameters of a model implemented with AllenNLP, a neural network library for natural language processing.
Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning. It…
Optuna is a library for hyperparameter optimization, providing flexibility in optimizing hyperparameters in machine learning. Optuna has many search algorithms for hyperparameters, including Tree-structured Parzen Estimator (TPE) , CMA Evolution Strategy (CMA-ES) , and Multi-objective optimization .
In Optuna, we define an objective function to perform hyperparameter optimization. A simple example is shown below.
The search spaces of parameters are defined by using suggest APIs. Passing the objective function to the method
study.optimize makes Optuna starts to optimize hyperparameters. For more information, please check the tutorial:
An Apache 2.0 NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide…
AllenNLP is the library for natural language processing using neural networks. It is developed by the Allen Institute for Artificial Intelligence. They have presented a paper and a tutorial at top NLP conferences, which are useful for NLP research. In addition, a variety of tutorials and demonstrations are available online, allowing beginners to experience cutting-edge NLP techniques.
There are two ways for implementing the model using AllenNLP: (a) writing a Python script and executing it directly, and (b) preparing a configuration file written in Jsonnet and running by the allennlp command-line interface. This article explains about the latter. If you want to use Optuna in your Python scripts, please see the official sample on GitHub.
Using a Jsonnet configuration file, users can train models by only writing configurations for the experiments. This eliminates the need to write a script to train a model and allows users to focus on their model architecture, hyperparameters, and training configuration.
One of the most famous hyperparameter optimization tools for AllenNLP is AllenTune. AllenTune supports optimization using a Jsonnet-style configuration file. And AllenTune supports simple Random Search and Grid Search algorithms for search parameters. The user can optimize by making a few line changes to the existing Jsonnet file in order to define the search space.
Optuna + AllenNLP
For AllenNLP, typically, the hyperparameters are defined with a Jsonnet file, while Optuna defines hyperparameters to be optimized by writing a Python script. To bridge this gap, we’ve created the
AllenNLPExecutor to allow Optuna ranges of hyperparameters to be defined in the AllenNLP Jsonnet file.
AllenNLPExecutor performs parameter optimization as follows.
- Edit the configuration file in Jsonnet format and mask the hyperparameters with
- Sample parameters from the search space defined using Optuna’s suggest API and setting up Jsonnet files to create a Params object for AllenNLP
- Pass the
allennlp.commands.train.train_modeland execute model training.
For details of the implementation, please see the pull request.
Add AllenNLP integration. by himkt · Pull Request #1086 · optuna/optuna
This PR introduces AllenNLP integration to run HPO with AllenNLP configuration files.
Previous, it would be necessary to create a module for each project to do the above. But now, with
AllenNLPExecutor, you can optimize hyperparameters with less effort. The PR was merged and has been available since v1.4.0, which was released on May 11.
To demonstrate Optuna’s AllenNLP integration, we tackle the sentiment analysis of the IMDb review data. The IMDb dataset contains 20,000 training data and 5,000 test data, each record containing a review submission for a movie or TV show and a label indicating whether the review was a positive or negative submission. The task, in this case, is to predict whether a review is positive or negative from the textual information in the body of the review.
If you create a configuration file in Jsonnet format using AllenNLP, it looks like the following. The configuration file and parameters are based on the official sample of AllenTune. The default value of the parameter is the median value of each parameter space defined in the official sample. We call this model baseline.
First, mask values of hyperparameters in the Jsonnet config with Jsonnet method calling
std.parseInt for integer or
std.parseJson for floating-point. [edited in 2020/07/28: please use
std.parseJson for casting parameters to desired value types.]
Resulting config would be the following:
Now that you have created the config, you can define the search space in Optuna. Note that the parameter names are the same as those defined in the config earlier. The objective function is as follows.
Once we have defined the search space, we pass the trial object to
AllenNLPExecutor. It’s time to create
AllenNLPExecutor takes a
trial, a path to config, a path to snapshot, and a target metric to be optimized as input arguments (
executor = AllenNLPExecutor(trial, config, snapshot, metric)). Then let’s run
executor.run to start optimization. In each trial step in optimization,
objective is called and does the following steps: (1) trains a model (2) gets a target metric on validation data (3) returns a target metric.
After all, creating
study.optimize starts parameter optimization.
The results of the hyperparameter optimization are shown below. The evaluation metric is the percentage of correct answers in the validation data.
Baseline is the model that I described in the preparation section.
Optuna+AllenNLP is the result of optimization with
AllenNLPExecutor. We performed the optimization five times with changing the seed values and calculated the average accuracy. Because the baseline is trained with fixed hyperparameters, the average accuracy remains constant over repeated trials. As a result of the optimization using Optuna, we can see that the average accuracy improves with the number of trials. To the end, the accuracy improved by about 2.7 points on average compared to using the original hyperparameters.
Optuna also has the feature to dump a configuration file with optimized hyperparameters. Call
dump_best_config with a path to config, a path to output config, and the
study already optimized.
The example for the output of
dump_best_config looks like the following. You can see that the values of parameters such as
embedding_dim masked with
std.extVar are rewritten with the actual values. Also, the output file can be passed directly to the command. This allows the user to relearn the model with optimized parameters.
In this article, I introduced how to combine AllenNLP and Optuna, a neural network library for natural language processing, to optimize hyperparameters, which is easy to use with a few modifications to AllenNLP’s Jsonnet file. As a demo, I worked on a polarity analysis of IMDb review data.
The sample I made for this demo is available on GitHub. If you want to run the sample, please try it!