Python Configuration Management using Hydra by Meta
This article is a concise explanation of the Hydra open-source configuration management package by Meta/Facebook Research.
Main Idea
“Facebook AI’s open source Hydra framework lets users compose and override configurations in a type-safe way (validated against user-provided schemas). Hydra also offers abstractions for launching to different clusters and running sweeps and hyperparameter optimization without changes to the application’s code. This greatly reduces the need for boilerplate code and allows researchers and engineers to focus on what really matters.” — Meta AI Blog
Main advantages of Hydra
Hydra has some distinct advantages over traditional configuration management in Python:
- Developers do not need to setup boilerplate code for command line flags, loading configuration files, setting directory paths, logging, etc. with Hydra
- Configurations can be set dynamically and can be overridden from the command line as needed.
- It has a pluggable architecture that allows developers to integrate Hydra with other infrastructures.
How to setup Hydra to handle configuration?
Here’s a quick breakdown of how to setup Hydra to handle configurations:
- First, install Hydra using the following command:
pip install hydra-core --upgrade
2. Next, create a configuration YAML file that will hold all necessary configuration files. For best practices, it is recommended to keep all your configuration files inside of a conf
folder.
Here’s an example conf/config.yaml
file:
hyperparameters:
N_EPOCHS: 20
BATCH_SIZE: 128
N_LAYERS: 3
3. Import hydra and initialize the main function with the hydra.main() decorator. The hydra.main()
decorator expects config_path
as the folder holding all of your configuration files and config_path
as the configuration YAML filename.
# main.py
import hydra
@hydra.main(config_path="conf", config_name="config")
def main(cfg):
# Access the cfg variable here
print(cfg)
return
if __name__ == "__main__":
main()
Output:
{'hyperparameters': {'N_EPOCHS': 20, 'BATCH_SIZE': 128, 'N_LAYERS': 3}}
If you have any bugs, here’s the folder structure for your reference,
Handling configuration variables using dataclasses
Since Hydra can be integrated directly with Python, we can make use of Hydra’s dataclasses
to
- Create a
config.py
file
# config.py
from dataclasses import dataclass
@dataclass:
class Hyperparameters:
N_EPOCHS: int
BATCH_SIZE: int
N_LAYERS: int
@dataclass
class AllConfig:
hyperparameters:Hyperparameters
2. Import the file and make use of Hydra’s config store in your main.py
# main.py
import hydra
from hydra.core.config_store import ConfigStore
from config import AllConfig
cs = ConfigStore.instance()
cs.store(name="all_config", node=AllConfig)
@hydra.main(config_path="conf", config_name="config")
def main(cfg: AllConfig):
# Access the cfg variable here
print(cfg)
return
if __name__ == "__main__":
main()
Output:
{'hyperparameters': {'N_EPOCHS': 20, 'BATCH_SIZE': 128, 'N_LAYERS': 3}}
If you have any bugs, here’s the folder structure for your reference,