Python Configuration Management using Hydra by Meta

Published in

Coinmonks

3 min readFeb 5, 2024

This article is a concise explanation of the Hydra open-source configuration management package by Meta/Facebook Research.

Main Idea

“Facebook AI’s open source Hydra framework lets users compose and override configurations in a type-safe way (validated against user-provided schemas). Hydra also offers abstractions for launching to different clusters and running sweeps and hyperparameter optimization without changes to the application’s code. This greatly reduces the need for boilerplate code and allows researchers and engineers to focus on what really matters.” — Meta AI Blog

Main advantages of Hydra

Hydra has some distinct advantages over traditional configuration management in Python:

Developers do not need to setup boilerplate code for command line flags, loading configuration files, setting directory paths, logging, etc. with Hydra
Configurations can be set dynamically and can be overridden from the command line as needed.
It has a pluggable architecture that allows developers to integrate Hydra with other infrastructures.

How to setup Hydra to handle configuration?

Here’s a quick breakdown of how to setup Hydra to handle configurations:

First, install Hydra using the following command:

pip install hydra-core --upgrade

2. Next, create a configuration YAML file that will hold all necessary configuration files. For best practices, it is recommended to keep all your configuration files inside of a conf folder.

Here’s an example conf/config.yaml file:

hyperparameters:
  N_EPOCHS: 20
  BATCH_SIZE: 128
  N_LAYERS: 3

3. Import hydra and initialize the main function with the hydra.main() decorator. The hydra.main() decorator expects config_path as the folder holding all of your configuration files and config_path as the configuration YAML filename.

# main.py
import hydra

@hydra.main(config_path="conf", config_name="config")
def main(cfg):
  
  # Access the cfg variable here
  print(cfg)
  return

if __name__ == "__main__":
    main()

Output:

{'hyperparameters': {'N_EPOCHS': 20, 'BATCH_SIZE': 128, 'N_LAYERS': 3}}

If you have any bugs, here’s the folder structure for your reference,

Handling configuration variables using dataclasses

Since Hydra can be integrated directly with Python, we can make use of Hydra’s dataclassesto

Create a config.py file

# config.py
from dataclasses import dataclass

@dataclass:
class Hyperparameters:
  N_EPOCHS: int
  BATCH_SIZE: int
  N_LAYERS: int

@dataclass
class AllConfig:
  hyperparameters:Hyperparameters

2. Import the file and make use of Hydra’s config store in your main.py

# main.py
import hydra
from hydra.core.config_store import ConfigStore
from config import AllConfig

cs = ConfigStore.instance()
cs.store(name="all_config", node=AllConfig)


@hydra.main(config_path="conf", config_name="config")
def main(cfg: AllConfig):

    # Access the cfg variable here
    print(cfg)
    return


if __name__ == "__main__":
    main()

Output:

{'hyperparameters': {'N_EPOCHS': 20, 'BATCH_SIZE': 128, 'N_LAYERS': 3}}