Optuna: Turbocharging Models with Hyper-parameter Magic

mayank khulbe
The Good Food Economy
6 min readMar 23, 2024

--

Tuning

Imagine a symphony without tuning. The result? Discordant notes, lost melodies, and a cacophony that fails to inspire. Just as tuning is essential to make music harmonious, hyperparameter tuning is the backbone of optimizing model performance in the realm of machine learning.

Traditionally, hyperparameter tuning has been akin to searching for a needle in a haystack—a laborious process of trial and error, often yielding suboptimal results. But fear not, for Optuna has emerged as the conductor of this chaotic orchestra, orchestrating harmony with unparalleled finesse.

In this blog, we’ll be diving into the exciting world of hyperparameter tuning with Optuna. No need to worry about the complexities of previous methods — today, we’re all about Optuna and its game-changing approach to optimizing model performance. In the end, we’ll unveil the Optuna dashboard — an indispensable tool for visualizing our results. With its intuitive interface, we’ll gain profound insights into our hyperparameter tuning efforts, transforming data into actionable intelligence.

Here is a detailed documentation on Optuna to help you understand it better.

Hyper-parameter Tuning

Hyperparameter tuning is a critical aspect of machine learning model development, involving the optimization of parameters that are not learned directly from the data but rather control the learning process itself. These parameters, often referred to as hyperparameters, include learning rates, regularisation strengths, and model architectures. By fine-tuning these hyperparameters, we aim to strike the optimal balance between model complexity and generalization ability.

Hyper-parameter tuning

Essentially, hyperparameter tuning seeks to minimize the model’s error on unseen data while preventing overfitting on the training set. This process is essential as it directly impacts the model’s performance metrics such as accuracy, precision, and recall. Through systematic exploration and optimization of hyperparameters, we can unlock the full potential of our models, enabling them to achieve superior performance across various tasks and datasets.

Next, we’re about to unveil the unparalleled power of Optuna in hyperparameter tuning, and what better way to showcase its capabilities than with the classic MNIST digit dataset? Get ready to witness Optuna’s prowess as we navigate through the intricate landscape of hyperparameter optimization, fine-tuning our model to achieve remarkable accuracy and performance.

Assuming that you have imported all the necessary libraries, let’s first create a simple CNN architecture for the classification of images.

CNN architecture

Once a CNN architecture is defined, we need to define the train and test data loaders to load the images as tensors for training and validation of the model.

Defining the data loaders

Now, since we have the data loaders defined, it’s time to create the functions to train and validate the model.

Model training and validation

The code above seems familiar, doesn’t it? Surprisingly, there’s little deviation even with the introduction of Optuna. That’s the beauty of it — Optuna seamlessly integrates into our existing codebase, offering power and comfort without the need for extensive rewrites.

Now, with our CNN training and validation functions primed and ready, we’re on the cusp of witnessing the formidable power of Optuna in action. But before we dive into the heart of hyperparameter optimization, there’s one crucial piece missing: the objective function. This function will serve as our guide, evaluating the performance of different hyperparameter configurations and guiding Optuna towards the optimal solution.

In optuna, conventionally functions to be optimized are named objective. It takes a very important parameter, trial. It corresponds to a single execution of the objective function and is internally instantiated upon each invocation of the function.

In simpler terms, think of a Trial object as a record of each time we run our objective function. Every time we call the function, a Trial object is created to keep track of that specific execution. It’s like having a notepad where we jot down the details of each attempt at finding the best hyperparameters for our model. This way, Optuna can learn from each trial and gradually steer towards the optimal configuration.

Also, in the above code snippet, the focal point lies in the way model configurations are defined.

Model configurations

The trial.suggest_float method suggests floating-point values for hyperparameters based on predefined ranges. The ‘step’ parameter in momentum defines the step size after which each value of momentum should be selected.

Note: The metrics that we aim to minimize/maximize during the hyperparameter training, test accuracy in this case, should be returned by the objective function.

Now, it’s time to set the optimization process in motion by creating a Study object within Optuna.

Optimizing the study

This Study object serves as the orchestrator of our hyperparameter tuning endeavour, overseeing the iterative exploration of parameter configurations. The optimize method on our Study object refines its search, gradually converging towards the most promising hyperparameter settings.

Storage: Defines the storage backend for recording and managing optimization results. In this case, it’s set to “sqlite:///db.sqlite3”, indicating the use of SQLite as the storage mechanism. This enables Optuna to store trial results persistently in a database.

Sampler: Specifies the strategy used for sampling hyperparameters during optimization. Here, TPESampler stands for Tree-structured Parzen Estimator (TPE) sampler, a popular method for efficient exploration of hyperparameter space. Here is a list of different samplers used in optuna.

Direction: Determines the optimization direction, indicating whether we aim to maximize or minimize the objective function. In this case, ‘maximize’ is specified, implying that we seek to maximize the objective function’s output.

Finally, running the above code, creates different trials with each subsequent trial aiming to, in our case, maximize the test accuracy.

Mode training

The above snippet captures the concluding moments of our hyperparameter tuning journey. As the model completes all its trials, the final trial ended with an accuracy of ~95% with trial 12 being the best trial with ~99% accuracy on the test data i.e. the output of our objective function.

Optuna-Dashboard

Now that our hyperparameter tuning process using Optuna is complete, we have at our disposal the Optuna dashboard. This dashboard serves as a comprehensive tool for reviewing all the trials conducted throughout our optimization journey. It offers a detailed overview of each trial’s results, including objective function values and corresponding hyperparameter configurations.

With the Optuna dashboard, we’re granted a bird’s-eye view of our optimization efforts, facilitating a deeper understanding of the performance landscape and aiding in the selection of the most effective hyperparameter settings.

Simply running the above code in the terminal, will produce the below output.

optuna-dashboard

Navigating to the link mentioned will take you to an Optuna dashboard where you can explore the results of hyperparameter optimization experiments. This dashboard provides visualizations and metrics to help analyze the performance of different hyperparameter configurations across your machine-learning models. It allows you to track trials, compare performance metrics, and gain insights into the effectiveness of various hyperparameter settings.

Optuna-Dashboard

Finally, I’ve embedded a walkthrough video demonstrating the features and functionalities of optuna dashboard. By watching the video, you’ll get an idea to navigate the Optuna dashboard.

This brings us to the end of the blog. I hope this blog post has been informative and helpful in your journey towards mastering hyperparameter optimization with Optuna. Feel free to leave any questions or comments below, and happy optimizing!”

If you feel any need to visit the code, feel free to click the Repo link.

Also, you can read more of my interesting articles/blogs. Some of the links are added below:

--

--

mayank khulbe
The Good Food Economy

🔍🧠 Data Scientist | Unraveling Insights in the World of ML & DL 🚀 | Transforming Data into Actionable Intelligence | Passionate of Solving Complex Problems