Distributed Optimization via NFS Using Optuna’s New Operation-Based Logging Storage

Published in

Optuna

9 min readSep 28, 2022

Hi, my name is Riki Otaki and I worked as a part-time engineer on the AutoML team at Preferred Networks. I implemented a new storage called JournalStorage, which has been incorporated into Optuna and will be released from v3.1.0. This article will be an introduction to it.

What Is JournalStorage?

JournalStorage is a new operation-based log storage that has been added to the storages of Optuna, a library of hyperparameter optimization. It will be released in the next minor version, Optuna v3.1.0, and is currently available from the master branch. In the conventional storages, when a change is made to a record, the value written before the change is overwritten with the value after the change. JournalStorage does not overwrite the “value”, but appends the “operation” that caused the value to change to the backend. By logging more abstract operations rather than the values themselves, it minimizes the number of IOs and provides a simpler interface than traditional storage APIs.

In this article, we first describe the challenges that Optuna Storage has been facing. Next, we describe how we mitigate those issues by introducing JournalStorage. Finally, we describe how we implemented NFS-compliant file storage as its backend and evaluated the performance of several locking implementations against an NFS environment.

JournalStorage Usage

Optuna Storages: The Three Challenges

Custom Backend Implementation Is Hard

Optuna Storage is designed to support the addition of custom backends in addition to the existing backends (RDB, In-memory, Redis). However, in practice, this is considered challenging for the following reasons.

There are 19 abstract methods in the base class of storages (BaseStorage) which imposes a large burden on implementers.
The specifications to be satisfied by each abstract method of BaseStorage are complicated, and there are many points to be considered, especially when dealing with concurrent processing.

Therefore, it was difficult to integrate your own backend unless you are familiar with specification details of Optuna Storage.

CachedStorage Is Complicated

Optuna has a class called CachedStorage that caches the results of storage API calls in memory. However, caching in a distributed application such as Optuna is inherently difficult, and its implementation has become a non-trivial and unwieldy component for developers. Also, while RDB and Redis rely on caching to achieve practical latency, CachedStorage places unnecessary restrictions on the data that can be modified, which is sometimes problematic for the users.

Parallel Optimization on NFS Is Not Possible

Some Optuna users do not have access to Postgres, MySQL, or Redis. For such users, SQLite3 is the only persistence method available, but SQLite3 does not support Network File System (NFS)[1], making it impossible to achieve large-scale parallel optimization over the network.

With the above background, JournalStorage was incorporated into Optuna. By providing NFS-compliant file storage as its backend, parallel optimization in such environments can now be realized.

Design of JournalStorage

JournalStorage was designed and implemented based on optjournal, a prototype developed by Takeru Ohta (@sile). It will be supported by Optuna starting with the next minor release, Optuna v3.1. Below is a description of how JournalStorage differs from traditional storage.

JournalStorage Logs the Operations on the Database

Optuna’s BaseStorage methods can be broadly classified into three types in terms of how to access storage: create, get, and set. In Optuna Storage, the create method is used to store study and trial records in the storage, the get method is used to retrieve the field information of the record, and the set method is used to overwrite the field information of the record.

19 Abstract Methods in BaseStorage Class

In JournalStorage, all create and set methods are reduced to a persistence of a single line of JSON. For example, calling the method JournalStorage.create_new_study(study_name = “foo”) will cause the backend to append “{operation: create_new_study, study_name: “foo”}”, and JournalStorage.set_study_user_attrs(study_ id = 1, key = “key”, value = “value”) will cause the backend to append “{operation: set_study_user_attrs, study_id: 1, key: “key”, value = “value”}”. (The actual structure of the JSON is much more complex than this.) We call this serialization result of a BaseStorage method, operation log or simply log. By executing (replaying) the operation logs in the order of oldest to newest, a snapshot of the database can be built in-memory. JournalStorage will consult the replayed in-memory snapshot when a get method is called, after applying the logs that have not yet been applied to in-memory.

Actual Operation Log in JournalStorage

Benefits of JournalStorage

1. Simpler Backend API

The backend API can be simplified using JournalStorage. Optuna has so many methods in BaseStorage that it is difficult to implement a custom backend. However, this is inevitable since Optuna has two entities, Study and Trial, each of which has record create operations and set and get operations for various fields. This problem can be alleviated by introducing JournalStorage, which stores data in an operation log format. As mentioned earlier, in JournalStorage, any data write operation (such as create, set) to the database is treated as a “dictionary-type JSON append”. Therefore, only two APIs are needed for the backend: a method to append and a method to read JSON.

Two Abstract Methods in BaseJournalStorage Class

2. No Need for an Internal Cache

Optuna accelerates RDB and Redis by wrapping the database in a class called CachedStorage. The use of internal caching is unavoidable from a practical standpoint, since retrieving results from the database each time without caching would drastically increase the latency of BaseStorage.get_trials() and BaseStorage.get_all_trials(). However, in general, caching in a distributed system requires consideration of issues such as “how do we deal with the case where one worker’s cache line is updated by another worker and becomes dirty?” Optuna is no exception. Optuna mitigates these problems by “prohibiting changes to a trial once it has been COMPLETE”, and by adding restrictions to Optuna’s functionality to make the cache implementation lightweight. However, CachedStorage remains a complex implementation, making it a difficult module to refactor and maintain. JournalStorage, on the other hand, does not need a cache; it keeps a snapshot of the database in-memory and updates it every time a new persistent log is applied. Therefore, get methods can read data faster with less IO.

Limitations of JournalStorage

Initial Loading of Data Takes Time

JournalStorage has some disadvantages. In JournalStorage, to restore the database from a persistent storage, the log must be traversed from the beginning and applied in-memory one by one to reconstruction. Thus, the initial load time is longer than that of a traditional backend. For example, suppose a key is updated with values 1 -> 10 -> 100 -> 1000. In a traditional database, all you have to do is read the final value, 1000, but in JournalStorage, you have to first apply the 1 -> 10 log, then the 10 -> 100 log, then the 100 -> 1000 log. While a traditional database only needs to restore the latest values in-memory, with JournalStorage, the more logs there are, the longer it takes to restore the data in memory.

Also, the current JournalFileStorage stores all logs in a single file. While it is nice and simple to handle fewer files, there is a drawback. For example, if there are 1000 studies and only one study information is needed, the data cannot be read without restoring all the studies to memory.

These problems can be alleviated by persisting snapshots in addition to the operation logs. For example, when the logs exceed a certain amount, a snapshot can be persisted in the traditional way, and logs that are no longer needed can be deleted. In this way, in-memory snapshots could be built up quickly. Snapshot functionality will be developed as part of future work.

For more information on JournalStorage, please refer to the article written by Takeru Ohta [2].

Supporting of Network File System (NFS)

We have implemented file storage as the default backend for JournalStorage. Until now, if an RDB or Redis environment was not available, the only way to perform parallel optimization using Optuna was to use SQLite3. SQLite3 is not recommended to be used in an NFS environment, and Optuna has not provided a means to perform parallel optimization over the network without using RDB or Redis. The JournalStorage file backend we have introduced supports NFS, so it can cover these use cases.

In order to read and write files on NFS from multiple processes, file locking is required to control multiple processes in parallel. Normally, flock(2) is used to obtain a file lock. However, as described in the flock(2) manual, flock(2) is not available in older versions of NFS.

It is said that there are several methods to realize file locking in NFS, but I could not find a clear and correct solution. (One of the reasons is that lock implementations are largely dependent on NFS versions.) In the following, I describe the design of three file locking methods (using symlink(2), using open(2), and using link(2)) and the results of their safety verification, and a comparison of their performance.

File Lock Using symlink(2)

Since rename(2) and symlink(2) are executed atomically in NFS v2 [3], they can be used to implement file locking. We implemented a mechanism for process exclusivity by making multiple processes create symbolic links with the same name and the process that successfully creates the link acquires the lock, and retries if it fails. This mechanism takes advantage of the fact that only one symbolic link with the same name can be created. For example, for the file log_file, all processes try to create a symbolic link log_file.lock. A successful process enters the critical section. If the file already exists, the process receives an error (EEXIST) and retries. When releasing the lock, the process rename(2) the created symbolic link to a uniquely named file and deletes it.

SymlinkLock Pseudocode

File Lock Using open(2)

In addition to rename(2) and symlink(2), NFS v3 allows create(2) to be executed atomically [3]. open(2) with the O_CREAT and O_EXCL options creates the file if it is not present and fails with an error (EEXIST) if it is, as create(2). This can be used for process exclusion. If the file is created, it enters the critical section; if not, it retries again. Like locking using symbolic links, file locking using open(2) takes advantage of the fact that only one file with the same name can be created. It is simple and straightforward, but it is not supported in NFS v2 or earlier.

OpenLock Pseudocode

File Lock Using link(2)

Although we could not find any formal documentation, we did find references [4] that link(2) is atomic in many NFSs and that file locking can be achieved by using hard links. For a given file, have multiple processes create hard links with different names for each process. After creation, stat(2) counts the number of hard links in the file, and if only one process has a hard link, that process can enter the critical section. Otherwise, remove the hard link and retry. For example, for a given file log_file, process 1 creates the link log_file.lock1 and process 2 creates the link log_file.lock2. In this case, link(2) does not fail with EEXIST because each process creates a hard link with a different name. After the link is created, the number of hard links in the file is counted using stat(2), and if exactly two are created, the file enters the critical section. Two refers to the original file and the hard links created by a single process.

LinkLock Pseudocode

Lock Verification Scenarios and Experimental Results

Using the above three locking methods, we tested their safety and compared their performance on our in-house NFS. The experiments are as follows:

Create a single log file in the NFS mounted directory.
Write 0 to the log file.
Start 10 processes and have them read the value written at the end of the log file, and append the incremented value to the end of the log file. For exclusivity of the processes, a file lock is taken using the method described above before the read and released after the append.
Measure the time taken to write the value 1000 times in total. Also, verify that the final file ends with 1000.
Perform 1. to 4. a total of 10 times and measure the average elapsed time.

The results are below:

+=========================+======================+
|     File Lock Type      | Average Elapsed Time |
+=========================+======================+
| Locking with symlink(2) | 0.318 (+/- 0.029) s  |
+-------------------------+----------------------+
| Locking with open(2)    | 3.517 (+/- 2.131) s  |
+-------------------------+----------------------+
| Locking with link(2)    | More than 10 s       |
+-------------------------+----------------------+

Locking with symlink(2) was consistently faster, followed by locking with open(2), and locking with link(2) was the slowest, probably because link(2) has more stringent conditions for acquiring locks than the other two. For both symlink(2) and open(2) locks, whether or not a lock is acquired depends on whether or not a single file is created. In contrast, locking using link(2) allows each process to create a file with a different name at any time, making it difficult to ascertain the moment when the number of hard links is exactly two. For reference, we experimented with link(2) locking by reducing the number of processes, and found that the time taken with one process was 0.417 (+/- 0.064)s, while with two processes it was 3.941 (+/- 2.242)s.

Thus, JournalStorage now supports symlink(2) locking by default and can use open(2) locking as needed. link(2) locking was not adopted for this project because its performance is not as stable as the other two and because link(2) atomicity is not guaranteed on NFS.

Conclusion

We implemented JournalStorage and added file storage as its backend. This makes it possible to easily perform concurrent and distributed optimization even in environments without RDB or Redis. In addition, since the file storage uses file locking with symlink(2) and open(2), it supports NFS, enabling concurrent and distributed optimization via NFS.

References

[1] SQLite Frequently Asked Questions

[2] 操作ログ方式でOptuna用ストレージを実装してみた話

[3] NFS illustrated by Brent Callaghan

[4] PHP: flock — Manual