File Management during LLM (Large Language Model) Trainings by Optuna v4.0.0 Artifact Store

Shuhei Watanabe
Optuna
Published in
5 min readAug 20, 2024

TL;DR

  • Artifact Store, which manages files generated during optimization by various file storage such as local file system and Amazon S3, is supported officially in Optuna v4.0.0,
  • Artifact Store enables users to view or check a wide range of file formats such as image and audio files on Optuna Dashboard,
  • Optuna v4.0.0 extended the Python API, making it easier and more convenient to download artifacts from Artifact Store,
  • Furthermore, CSV and JSONL file viewers have been newly supported on Optuna Dashboard and they can now be displayed in the tabular format, and
  • In this article, I explain the improved usability using an experiment with a large language model.

I kindly ask readers to defer to a simpler example and Artifact Store Tutorial for more details. Notice that as Optuna v4.0.0 is a beta version, readers need to explicitly install the beta version to reproduce the experiment in this article.

# Optuna Dashboard must be v0.16.0 or later.
$ pip install optuna==4.0.0b0 optuna-dashboard>=0.16.0

What is Artifact/Artifact Store?

Artifact is a file generated or used during an optimization. For example, as seen in the red rectangular of Figure 1, each Trial during hyperparameter optimization of a large language model may generate various files such as a learning curve plot, inference results in the CSV format, and model snapshot files. Artifact Store is very convenient for managing such artifacts and their visualization by Optuna Dashboard.

Figure 1. The conceptual visualization of artifacts in Optuna. First, a Trial object is passed to the objective function and then the objective function is evaluated. The objective function needs to return a set of objective values (values) at the end based on the parameters (params) suggested by Optuna. Users can store short strings and easy dicts by user_attrs jointly with params and values in storage such as RDB (See the blue part). Artifact Store manages files generated during the objective evaluation, dubbed artifact. Since Artifact Store utilizes the file system or cloud storage, it enables users to manage large-sized files, which storage such as RDB is not capable of handling efficiently. For example, relatively large files such as learning curve plots and model snapshot files are concrete examples.

Users can manage files associated with Trial or Study by Artifact Store officially supported from v4.0.0. As Artifact Store can specify various storage as a save destination, users can store artifacts not only in local file system but also in object storage such as Google Cloud Storage (GCS) and Amazon S3 compatible storage.

As mentioned earlier, an advantage of Artifact Store is to be able to view the contents of artifacts directly on Optuna Dashboard. For example, as shown in Figure 2, the JSONL (or CSV) file is displayed in tabular format. Furthermore, it is possible to play an audio or a video file on Optuna Dashboard.

Figure 2. An example of the tabular format visualization on Optuna Dashboard for a JSONL file associated with Trial. This figure was created based on a hyperparameter optimization of an LLM. Left: A Trial detail page. For each Trial, inference-results.jsonl (orange highlight), which lists the responses of the trained LLM given a pair of context and question, was uploaded to Artifact Store. Right: The table artifact viewer for the JSONL file. The viewer is opened by clicking the expand icon on the file of interest.

Modifications Made in Optuna v4.0.0

Optuna v4.0.0 enhanced not only the visualization on Optuna Dashboard such as the table artifact viewer introduced in Figure 2, but also Python API to increase the usability. More specifically, we worked on the stabilization of the artifact upload API and the addition of new APIs: the artifact download API and the API to list all the artifact metadata, which is necessary for the download, associated with a specific Trial or Study.

With these changes, it will be much easier to make use of artifacts for post-hoc analysis or the artifacts connected to the best Trial from user scripts. For example, if each Trial uploads the compressed file of LLM snapshots to Artifact Store, the snapshots for the best Trial can be easily downloaded to the local file system via the new API. In the next section, I would like to demonstrate the API usage with an actual code.

Use Case of Artifact Store: Hyperparameter Optimization of LLM

In this section, I would like to explain the use case of Artifact Store for a local file system. First, we optimize the hyperparameter of an LLM using Optuna and show the results on Optuna Dashboard. The actual code is available on Gist.

In this example, each Trial uploads the following Artifact files:

  1. The training log of LLM (CSV file)
  2. The responses by the trained LLM to each question (JSONL file)
  3. The learning curve plot (PNG file)
  4. The model snapshot file (GZip File)

We first run the code on Gist and launch Optuna Dashboard based on the results obtained by the script:

# Launch Optuna Dashboard with the URL of RDB and the Artifact base_path.
$ optuna-dashboard sqlite:///demo.db --artifact-dir artifacts

The following video shows how Optuna Dashboard will look when we launch it:

Figure 3. The visualization by Optuna Dashboard for the experiment. The Trial detail page lists the uploaded artifacts. The table artifact viewer is available for CSV and JSONL files, and it will open by clicking the expand icon. The thumbnail will be displayed for an image file. In this example, the training log (CSV) and the response list by the trained LLM (JSONL) are shown by the table artifact viewer.

To illustrate the Python API usages, I picked and modified the code on Gist below:

import optuna

# Artifact will be stored in this directory.
base_path = "artifacts"
# Create the directory.
os.makedirs(base_path, exist_ok=True)
# Instantiate an Artifact Store with the directory path.
artifact_store = optuna.artifacts.FileSystemArtifactStore(base_path)

def objective(trial):
# Suggest hyperparameters by Optuna.
train_params = suggest_train_params(...)
# Train an LLM using the hyperparameters suggested by Optuna.
trainer = ...; trainer.train()

# Record the responses by LLM to each question as a JSONL file.
inference(...)
# Upload the JSONL file to Artifact Store.
optuna.artifacts.upload_artifact(study_or_trial=trial, file_path=inference_path, artifact_store=artifact_store)

# Upload the learning curve plot, log, and snapshots in the same way.
...

valid_loss = ...
return valid_loss


storage = optuna.storages.RDBStorage("sqlite:///demo.db")
study = optuna.create_study(storage=storage, study_name="demo")
study.optimize(objective, n_trials=10)

In this example, each artifact is uploaded to base_path specified in FileSystemArtifactStore using upload_artifact, which is one of the Python APIs. As in the example above, the upload can be performed by only one line as long as the file already exists.

Additionally, the download of the model snapshot for the best trial can be easily done from user scripts using the new APIs:

# Get the best Trial.
best_trial = study.best_trial

# The file name used for the uploads of model snapshots in each Trial.
model_file_name = "model.tar.gz"

# Get all the artifact metadata associated with the best Trial.
artifact_meta = optuna.artifacts.get_all_artifact_meta(trial, storage=storage)
# Get the Artifact ID of the model snapshot file.
artifact_id_for_model = [am.artifact_id for am in artifact_meta if am.filename == model_file_name][0]

# Download the model snapshot trained in the best Trial to download_file_path.
download_file_path = "./best_model.tar.gz"
optuna.artifacts.download_artifact(
artifact_store=artifact_store,
file_path=download_file_path,
artifact_id=artifact_id_for_model,
)

As shown above, the model snapshot can be easily downloaded with the new APIs by specifying a download path. The new APIs make the reuse of artifacts much simpler.

Conclusion

Optuna v4.0.0 enhanced the Python APIs for the file management mechanism Artifact Store. As demonstrated in this article, the reuse of artifacts from user scripts became much simpler. Besides this, the visualization of artifacts in Optuna Dashboard is also improved and users can now view CSV and JSONL files in the tabular format. Last but not least, Tutorial and a simpler example are also available for Artifact Store, please check the tutorial as well!

--

--