Structure learning using Benchpress

7 min readAug 29, 2022

benchpressdocs.readthedocs.io

Probabilistic graphical models

Probabilistic graphical models have increased in popularity in the last years, due to the increased interest in causal inference, applied in statistics and machine learning. A (probabilistic) graphical model is a representation of dependency among a set of random variables. It is graphical in the sense that random variables are represented as nodes in a graph, and edges represent dependencies between them. Different types of graphical models exist, and one of the most popular is Bayesian networks, in which the graph encodes directed relationships that have no cycles (directed acyclic graph). Figure 1 illustrates the WetGrass Bayesian network, where the probability of the grass being wet on a day depends directly on two other random variables: if it rained and if the sprinklers are on, which in turn both depend on if it is cloudy. Without knowing the graph, it is easy to assume that sprinklers start depending if it was rainy or not, leading us to quantify such probability wrongly. From Figure 1, we know that sprinklers are activated with a probability of 0.1 on cloudy days.

Figure 1. Directed acyclic graph (DAG) and probability tables for the WetGrass Bayesian network.

Graphical models can encode different types of pairwise relationships as well, for example, graphs that encode undirected, bidirected, or mixed types of edges, each representing specific forms of associations. The story series by Prasoon Goyal, and Devin Sony are great intros to this topic. For more about Bayesian networks, see e.g. the book Probabilistic graphical models: principles and techniques by Prof. Daphne Koller.

Structure learning

Unlike the WetGrass example (Figure 1), in most real-world problems, we do not know the underlying association structure between a set of random variables. Structure learning is the process of approximating such structure from collected data. Under various setups, different structure learning algorithms have been developed. These are usually divided into three categories: score-based, where the aim is to maximize a score function defined on the graph space. The constraint-based, where edges in the graph are estimated by means of hypothesis testing, and hybrid methods which is a combination of these two. There is also a novel third category, sometimes called gradient-based methods, where the structure learning problem is turned into a continuous optimization program solved by gradient-based optimization techniques. While the strategies differ, the main objective is usually to find the best graph to represent the dependence structure. Comparing all such learning algorithms is daunting. They are coded in different formats, languages, and for differing objectives.

What is Benchpress?

Benchpress is a Snakemake workflow that enables execution of- and seamless comparison between state-of-the-art structure learning algorithms. Benchpress has the philosophy that nothing that is already available should have to be reimplemented. Following this philosophy, Benchpress, at the time of writing, contains more than 40 algorithms, coming from packages like bnlearn, pcalg, and BiDAG in R, gCastle, trilearn, and parallelDG in Python, and GOBNILP in C. The architecture of Benchpress is divided into modules that specify the data source, the structural learning algorithms, and the benchmark metrics. Thanks to the modular structure, it is also easy to add new functionalities to each category. The available modules are constantly increasing, so to get the updated list check out the docs. For a more detailed description of the architecture take a look at the associated paper.

In order to run a study, Benchpress ingests a JSON configuration file from the user, which specifies the modules (i.e. data source/algorithms/evaluation metrics) to use in a specific study. Based on the JSON file, Benchpress loads and runs the modules from predefined Docker images. The output is then stored in files and figures that can be analysed by the user. The JSON config can be shared with anyone, and by running it through Benchpress, they can replicate the results.

How does Benchpress work?

Benchpress utilises two technologies, Snakemake and Apptainer (former Singularity). Snakemake is a Python-based workflow management system that has become very popular in bioinformatics in the last years and is perfectly suited for large and parallel computations involving many different files. It enables Benchpress to run each module in independent containers, which loosely speaking can be viewed as lightweight virtual machines. Snakemake handles this using the container management system Apptainer which is designed for high-performance computing (HPC).

Installation

The first step to getting some results from Benchpress is to install Snakemake and Apptainer. Once these are installed, there is no extra installation step for Benchpress. If you are using a Linux system you can install Snakemake and Apptainer following the instructions on their homepage. However, in this post, we will use an interactive Docker shell. Once Docker is installed, you only have to clone (or download) the Benchpress repository as

git clone https://github.com/felixleopoldo/benchpress.git

Benchmark with Benchpress

To start benchmarking multiple algorithms, we can use the JSON config file shipped with Benchpress at config/config.json. Don’t worry too much about the structure of the config file at this time, there will come another story about that.

The config file defines a small study based on three realisations of a random Gaussian structural equation model (a popular type of Bayesian network model), where one dataset is sampled from each model. For each of the (three) datasets, it runs the following four structure learning algorithms: fast greedy equivalence search (FGES) from TETRAD, PC from pcalg, Tabu from bnlearn, and Iterative MCMC from BiDAG. The estimated graphs are then stored and benchmarked using evaluation modules specified in config/config.json.

Snakemake command

To run the above study, head into the benchpress directory and start an interactive Docker shell, as

cd benchpress
docker run -it -w /mnt --privileged -v $(pwd):/mnt snakemake/snakemake:stable

then simply run the following Snakemake command in the interactive shell

snakemake --cores all --use-singularity --configfile config/config.json

The files produced by this command will be found in the benchpress folder. The flag — cores tell how many cores to use, — use-singularity tells Snakemake to use Apptainer, and — configfile specifies the config file. You may check out Snakemake´s docs for other flags. For example, with proper flags, you can run Snakemake on a cluster or a grid.

Some results

Benchpress stores all the sampled Bayesian networks and datasets from the simulation setting along with the results from the algorithm executions systematically in the results/ folder, based on the modules used and their parametrisations. However, for quick access, the output of the evaluation modules is also copied into results/output and structured based on module names. Here we show some of the output generated from the evaluation modules graph_plots and benchmarks.

Figure 2 is produced by the graph_plots module and shows both the true and an estimated graph estimated by the Tabu algorithm in one adjacency matrix. A colored square at position (a,b) corresponds to an edge a->b, where the correct edges are black, the incorrect edges are red, and the missing edges are blue. Thus the black and blue (red) edges together give the true (estimated) graph.

The title tells all about the modules used in the data sampling procedure (i.e. which graph, parameters, and data are used). In this case, the pcalg_randdag module was used for generating the graph, the sem_params module for generating the Bayesian network parameters, and the iid module for sampling the data. The text to the left in the plot tells us which structure learning algorithm has been used along with its argument, in this case, the Tabu algorithm from the bnlearn_tabu module.

Benchpress difference plot — Figure 2. Difference adjacency matrix for a Tabu estimate and the true DAG showing correctly estimated edges (black), wrongly estimated edges (red), and missing edges (blue).

Figure 3 is a difference plot (produced by the graph_plots module which uses the graphviz.compare function from bnlearn) of the so-called pattern graphs (where the edge direction of the so-called v-structures are kept while the other edges are made undirected) of the true and estimated DAGs.

Benchpress graph difference plot — Figure 3.

The plot in Figure 4 shows a summary of the results (produced by the benchmarks module) in terms of true and false positive rates estimated edges (TPR/FPRp), for all three (in this case) data sets with different parameter settings for each algorithm. Each color corresponds to an algorithm and each dot (barely seen) is the TP/FP of that algorithm (with some of the parameter settings). The receiver operator characteristics (ROC) type curves are based on the median TPR and FPRp of different parameter settings and the error bars show the estimated 5% and 95 % quantiles.

To the left in Figure 5 we see the total computational time averaged over the different datasets and grouped by algorithm and parametrization. To the right is the average structural Hamming distance (SHD), which is the number of additions/deletions/flips of an estimated graph in order to become the true graph.