HOLA Optimization: A Lightweight Hyperparameter Optimization Software Package
AI Labs Researcher Gabriel and Quant Engineer Cristian share an open-source package for multi-objective hyperparameter optimization.
By: Gabriel Maher, Data Scientist, AI Labs
Cristian Matache, Quant Engineer, BlackRock Systematic
Stephen Boyd, Senior Advisor, co-head AI Labs
Mykel Kochenderfer, Senior Advisor, AI Labs
Alex Ulitsky, Managing Director, Portfolio Modeling
Slava Yukhymuk, Director, Portfolio Modeling
Leonid Kopman, Vice President, Portfolio Modeling
We built a lightweight hyperparameter optimization software package (called HOLA) by combining a few simple algorithms and models together. Our design has a simple interface, works surprisingly well, and handles multiple objectives. Check out the code and to learn more read our paper, “A Light-Weight Multi-Objective Asynchronous Hyper-Parameter Optimizer.”
Engineering or quantitative solutions for projects typically contain several parameters that influence the performance of the solution. Some of these parameters can be found using standard procedures, e.g. the coefficients in a linear regression or the weights and biases in a deep neural network. These can be readily found (although not always exactly) using ordinary least squares- or gradient descent-based procedures.
However, it can be difficult to find the values for other parameters, such as the regularization strength of a Ridge regression, or an appropriate learning rate or dropout value for neural networks. While we can evaluate the performance of our solution for a given choice of these parameters, there is no standard method to find values that result in particularly good performance. Often our objective function is not differentiable with respect to these parameters. Finally, it is typically computationally expensive to try out different values of these parameters, e.g. running a full train/test evaluation for a machine learning model. Parameters with these properties are typically dubbed “hyperparameters”.
Due to its importance, the hyperparameter optimization problem has been the subject of quite some study. The simplest approach is to just try lots of different values for each hyperparameter. However, if we have lots of hyperparameters this leads to the curse of dimensionality. Other approaches such as Bayesian optimization or TPE try to learn more about the way in which hyperparameters affect performance and use this information to select the next hyperparameter values to try. A downside of these approaches is that they are not as parallelizable, because they need to wait until previous runs are done before starting the next ones. See our paper for a longer list of references.
Given that selecting good values for hyperparameters is important yet difficult, what is a good way to go about this? That is the problem we set out to solve with the HOLA software package. We wanted to create a method that was simple, easy to use, yet also performed well. How could we also keep the method highly parallelizable while still being more performant than just randomly sampling hyperparameter values?
We settled on a hybrid approach. Since random sampling is inherently parallel, we decided to have our algorithm start with random sampling. We modified the random sampling method to use Sobol sequences so as to better cover the possible space of hyperparameter values and improve performance. Next, we set a threshold at which our method would switch to using a learned model to better identify promising hyperparameter values. We used a simple Gaussian Mixture model (GMM) to allow us to sample new hyperparameter values and report results in parallel.
Conceptually, HOLA first selects a wide range of possible hyperparameter values to get an idea of where the most promising samples could lie in the hyperparameter space. The GMM is then fit to the best performing 20–30% of hyperparameter samples seen so far, effectively learning the distribution of “good” hyperparameter values. As we collect more samples, the GMM improves and recommends even better samples. Here’s an example of HOLA in action on a 1-dimensional test case.
We compared HOLA to a range of existing methods ranging from simple to complex, on several different test cases. HOLA often came in the top 2–3 methods ranked by performance, which was quite surprising considering its simple design. In particular, it performed significantly better than TPE from hyperopt, which is a popular method that belongs to the same Bayesian class.
The paper contains lots more detail regarding how we evaluated HOLA. Below is a plot summarizing the results of one of the benchmarks (Gradient Boosted Regressor on the Diabetes dataset). Since we are looking for the minimum, lower is better as far as the value found goes.
We have already successfully used HOLA for several different applications within BlackRock’s AI Labs such as returns forecasting and portfolio optimization. Next, we are excited to continue working on HOLA and see what problems it can solve for you. We’d love to hear from you if you have any suggestions or any other feedback — stop by and say “hola” (Spanish for “hello”)!