Introducing the Stabilized JournalStorage in Optuna 4.0: From Mechanism to Use Case
Introduction
The default storage class in Optuna is InMemoryStorage
. However, InMemoryStorage
does not persist trial histories and cannot be used for distributed optimization. Therefore, Optuna provides multiple storage classes.
In Optuna 4.0, JournalStorage
and JournalFileBackend
, which are among the storage classes, have been officially supported. In this blog, we will introduce these technical points, as well as their use cases and how to utilize them.
About JournalStorage
JournalStorage
is one of the storage classes in Optuna. The name originates from the fact that it records the operational logs of Optuna in a stacked, journal-like manner. The primary motivation for its introduction was to make it easier to implement various backends (such as databases) as storage for Optuna. To achieve this, the design separates the responsibility, where the JournalStorage
class acts as Optuna’s storage, while another class is responsible for reading and writing to the backend. The JournalStorage
class is designed to accept objects of classes prepared for each backend during initialization (see Fig. 1).
JournalStorage
was experimentally introduced in Optuna v3.1, and the official support began in v4.0. With the official support, backward compatibility of log files will be guaranteed. Additionally, class names and module paths have been reorganized, and stability has been improved.
A simple code example using JournalStorage
can be written as follows:
import optuna
from optuna.storages import JournalStorage
from optuna.storages.journal import JournalFileBackend
def objective(trial):
xs = [trial.suggest_float(f"x{n}", -1.0, 1.0) for n in range(3)]
return sum(x ** 2 for x in xs)
storage = JournalStorage(JournalFileBackend("./optuna_journal_storage.log"))
study = optuna.create_study(storage=storage)
study.optimize(objective, n_trials=20)
As mentioned earlier, the JournalStorage
class takes a class object that implements the functionality to access the backend (in this code, the JournalFileBackend
class) as an argument during initialization. This is one of the key features of JournalStorage
, making it an easy method to add various backends to Optuna. As of Optuna 4.0, in addition to JournalFileBackend
, the JournalRedisBackend
class has been implemented to use Redis, a famous NoSQL database, as a backend. (However, the focus of stabilization in this release is limited to JournalFileBackend
.)
For further details on JournalStorage
, please also refer to the past blog post.
About JournalFileBackend
The JournalFileBackend
class provides storage functionality compatible with distributed optimization. It can be used by passing an object to the initialization of the JournalStorage
class, as shown in the sample code above. The greatest advantage of JournalFileBackend
is that it enables distributed optimization via Network File System (NFS). This is achieved by implementing mutual exclusion using system calls defined as atomic by the NFS specification. Two methods for acquiring locks are each implemented as separate classes, and you can switch between them by passing an optional argument during the initialization of the JournalFileBackend
class. For usage instructions, please refer to the documentation, and for further details on the mechanism, please refer to the past blog post, which explains this alongside details of JournalStorage
.
In addition to JournalStorage
, another method to store Optuna’s studies is RDBStorage
. RDBStorage
supports SQLite3 as well as MySQL. Like JournalFileBackend
, SQLite3 can also use a single file as storage, making it a convenient option. However, it is known that when the SQLite3 file is located on an NFS, simultaneous access from multiple nodes or processes does not function well. This issue is mentioned in the official SQLite FAQ. Given these considerations, JournalFileBackend
is the only method available for distributed optimization in Optuna 4.0 when using a single file on NFS as storage.
A Use-Case
While executing large-scale distributed optimization using RDBStorage
and MySQL, the load on the MySQL server became a bottleneck during the analysis of the optimization results. To streamline the analysis process and reduce the load, we considered migrating the studies stored in MySQL to another storage using the optuna.copy_study
function. While we also looked into converting to storage options like Redis and SQLite3, we found that converting to JournalStorage
with JournalFileBackend
was the fastest and most suitable for our workload. This allowed us to avoid placing additional load on the MySQL server and perform the analysis more quickly.
Changes for Stabilization in Optuna 4.0
In Optuna 4.0, class names and module paths were reorganized to make the API more intuitive. The code in the example above conforms to the specifications of v4.0. (Excerpted and re-posted below:)
from optuna.storages import JournalStorage
from optuna.storages.journal import JournalFileBackend
storage = JournalStorage(JournalFileBackend("./optuna_journal_storage.log"))
Code written using the v3.x syntax will continue to function with backward compatibility for the time being, but it is deprecated. For details about the new module paths, please refer to the documentation and migration guide.
All log files created since the release of v3.1 will remain usable in v4.0 and beyond! Please use the file from the JournalFileBackend
class in the same way as shown in the code example above.
Conclusions
In this article, we introduced the usage and a practical use-case of the storage features stabilized in Optuna 4.0. JournalStorage
with JournalFileBackend
is a powerful method that supports distributed optimization while only relying on NFS. We encourage you to give it a try!
Optuna 4.0 has made significant progress in many areas, in addition to stabilizing JournalStorage
. For more details, check out the Optuna 4.0 release blog!