Installing CDT(Causal Discovery Toolbox) in Windows

M.M.Islam (Mishu)
4 min readOct 12, 2022

--

Photo by DeepMind on Unsplash

Causal discovery aims to infer causal structure from data. For implementing different algorithms (in R and python) there exists a python library called CDT (Casual Discover Toolbox). CDT is a package for causal inference in graphs and in pairwise settings for Python>=3.5 where tools for graph structure recovery and dependencies are included. The package is based on Numpy, Scikit-learn, Pytorch, and R.

Now, I will carry you through the installation process of CDT in Windows using Anaconda very briefly.

Package and version used for Python and R

Python == 3.10
R == 4.2.1

now, a simple way of installing cdt is

pip install cdt

But to avail, the algorithms implemented in R i.e. (PC algorithm, GEIS algorithm, and many more) and bnlearn-based algorithm a handful of dependencies need to be installed. If they are not installed properly then you could encounter errors like

ImportError: R Package (k)pcalg/RCIT is not available. RCIT has to be installed from https://github.com/Diviyan-Kalainathan/RCIT

or

which is because of one of the dependencies named ‘igraph’ (R package)

The Causal Discovery Toolbox documentation page explains the installation process in the installation script of the R packages on Ubuntu 20.04. But in windows, in Anaconda it is somewhat tricky. (I would recommend going through the documentation page first because this post is intended to be very brief )

Let’s begin then!

To run R code in python, the rpy2 library needs to be installed. For that,

pip install rpy2

Straightforward, right?
but to load any R code in the script, we need to set the R_Home dynamically like the snippet below

###set R_HOME dynamicallyimport os
os.environ['R_HOME'] = r'C:\Users\Mishu\anaconda3\Lib\R' #for my case
###set R bin
os.environ['path'] += r';C:\Users\Mishu\anaconda3\Lib\R\bin;' #for my case
###loading the rpy2 library
%load_ext rpy2.ipython

Now, comes the work of R dependencies. In Jupyter Notebook it can be run using

%%R 
install.packages(c(“package name”), repos=” “)

To install all the dependencies

%%R
system("apt-get install -t buster-cran40 r-base -y --allow-unauthenticated")
system("apt-get install libssl-dev -y")
system("apt-get install libgmp3-dev -y --allow-unauthenticated")
system("apt-get install git -y")
system("apt-get install build-essential -y --allow-unauthenticated")
system("apt-get install libv8-dev -y --allow-unauthenticated")
system("apt-get install libcurl4-openssl-dev -y --allow-unauthenticated")
system("apt-get install libgsl-dev -y")
system("apt-get install libxml2-dev -y --allow-unauthenticated")
system("apt-get install libharfbuzz-dev libfribidi-dev libfontconfig1-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev -y --allow-unauthenticated")
install.packages(c("usethis", "shiny"), repos="http://cran.irsn.fr", Ncpus=4)
install.packages(c("Rcpp"), repos="http://cran.irsn.fr", Ncpus=4)
install.packages(c("V8"), repos="http://cran.irsn.fr", Ncpus=4)
install.packages(c("sfsmisc"), repos="http://cran.irsn.fr", Ncpus=4)
install.packages(c("clue"), repos="http://cran.irsn.fr", Ncpus=4)
install.packages("https://cran.irsn.fr/src/contrib/Archive/randomForest/randomForest_4.6-14.tar.gz", repos=NULL, type="source", Ncpus=4)
install.packages(c("lattice"), repos="http://cran.irsn.fr", Ncpus=4)
install.packages(c("devtools"), repos="http://cran.irsn.fr", Ncpus=4)
install.packages(c("MASS"), repos="http://cran.irsn.fr", Ncpus=4)
install.packages("BiocManager", repos="http://cran.irsn.fr", Ncpus=4)
install.packages("https://cran.irsn.fr/src/contrib/Archive/fastICA/fastICA_1.2-2.tar.gz", repos=NULL, type="source", Ncpus=4)
BiocManager::install(c("bnlearn", "pcalg", "kpcalg", "glmnet", "mboost"), Ncpus=4)

Keep in mind that if you are installing the packages using the above-mentioned way then by default the most updated package will be installed. In some cases using R == 3.6 and R == 4.2.1 will have a conflict in different packages, for example, BiocManager. BiocManager version 3.16 (compatible with R version == 4.2.1) is no longer available in CRAN projects, so you must install it first to obtain other dependencies.

My primary struggle is working with the igraph R package, which is a required dependency for running the algorithm based on R (i.e. PC algorithm, GES algorithm, etc.).

So, if you run

BiocManager::install(c(“igraph”))

then, it will not probably work in windows (anaconda).

The reason behind that is, you cannot take igraph compiled for CRAN R and use it in Anaconda R. igraph in the CRAN R distribution requires ‘Rlapack.dll’ which provides a function named ‘quadmath_snprintf’, which is not present in Anaconda R for some reason. So, We need to install igraph for Anaconda R from conda-forge instead. This version is compiled with and for Anaconda R so presumably it will work without problems.

igraph needs to be installed from here:

%%R
conda install -c conda-forge r-igraph

After that, RCIT needs to be installed using devtools:

You can run

%%R
install.packages(“devtools”)
library(devtools)
install_github(“cran/CAM”)
install_github(“cran/momentchi2”)
install_github(“Diviyan-Kalainathan/RCIT”, quiet=TRUE,verbose=FALSE)
install.packages(“https://cran.rproject.org/src/contrib/Archive/sparsebn/sarsebn_0.1.2.tar.gz", repos=NULL, type=”source”)

Last but not least, it’s always better to check whether your R dependencies are correctly installed by running

%%R
library(“package_name”)

If the package is installed properly then it won’t throw any errors.

By doing so, I hope the CDT toolbox will work, and if it doesn’t I would suggest checking all requirements and versions properly. And in the final case scenario, you would have to download all the dependencies(tar file) manually which would be also daunting in terms of finding an exact version of each dependency. The list can be obtained here.

I have been using CDT for some days and especially this library is a need for those who are tinkering with causal models, structure learning, causal analysis, and inference. Probably, I will link a notebook in this article later.

--

--