[updated 2015–11–14 to reflect latest developments of mxnet]
In case you missed it, the amazing group of committers behind the dmlc project just released mxnet to run deep learning experiments. For reminders this is the same crowd that maintains the excellent xgboost, a favorite among Kaggle competitors.
Integrating mxnet in your data science practice should be pretty straightforward as there are wrappers for Python, Julia, C++ and R. Most of the research I do in the context of my Cross Gradient data science practice is based on the R statistical environment. This is fantastic news for R users as it means we can finally use a powerful package to efficiently train deep neural networks or LSTMs on multiple GPUs, all in R. Therefore, I will present my installation procedure of the mxnet wrapper for R compiled with OpenMP support for multi-core parallelism, in a somewhat vanilla Mac OS X Yosemite environment.
You will need :
- recent R with latest Rcpp
- clang C++ compiler with OpenMP support
- GFortran libs and compiler
- mxnet repository, recursively cloned
Don’t you have it already ? Try typing “git” in a Terminal, if you don’t have it you will be prompted to install it through Developer Tools.
R and Rcpp
I use a somewhat recent version of R and RStudio. The entire installation will be done using R on the command line so we can safely ignore RStudio for the entire procedure. Spinning a Terminal and issuing the R command :
R version 3.2.1 (2015–06–18) — “World-Famous Astronaut”
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Make sure you have the most recent version of Rcpp, in my case I had an older version that gave me problems compiling the mxnet package. Upgrading and installing are performed with the same command in R, in this case it pulled version 0.12.1 :
The compilation of Rcpp initially failed for me because X11 libraries couldn’t be located where they were expected. If it is the case for you too, exit R. Some users suggest creating a symbolic link (sudo ln -s /usr/X11 /opt/X11) to fix this, others suggest installing the latest version of XQuartz, which does that too. I personally when the XQuartz route because I know I will need it for other software (Inkscape for instance). So download XQuartz and install it, after a while it prompts you saying it won’t be the default X11 window manager until you logout and log back in. This is fine and you do not need to log out at this point as all the expected libraries are in /opt/X11. Now go back in R and try again installing Rcpp, just like before. If it completes you’re good to go.
If you haven’t done so already, Homebrew comes very handy to manage hundreds (thousands?) of third-party packages on your Mac. Here is the classic one-liner to install it from a Terminal prompt :
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
Now let’s make sure our C++ compiler supports OpenMP to enable parallelization in mxnet. On my old Mac Pro this provided a huge performance boost for xgboost by allowing me to parallelize training on 16 hyper-threads. I am shamelessly assuming it remains true for mxnet.
Open up a Terminal and use Homebrew to install OpenMP support for the clang compiler that comes by default with Yosemite. We also take this opportunity to install pkg-config.
brew install clang-omp
brew install pkg-config
GFortran 4.9.2 for Yosemite
For some reason compiling R packages can sometimes require a fortran library, this is one of these times. Download it from here and install using the wizard.
OpenCV is a well-known package to handle Computer Vision tasks.
brew tap homebrew/science
brew info opencv
brew install opencv
The R package will essentially rely on the mxnet library to execute parallelized code. A github search through the repository shows that the Rcpp code itself does not contain any parallelization directives. Therefore we have to build this library first.
Open a Terminal and recursively clone the mxnet repo :
git clone --recursive https://github.com/dmlc/mxnet
From there you can go in the cloned directory :
cp make/osx.mk config.mk
Now edit the config.mk with your favorite text editor and change it so that :
export CC = clang-omp
export CXX = clang-omp++
USE_OPENMP = 1
Save and close the file, launch compilation :
Finally, installing the R package
Now that you have libmxnet.so built you can build and install the R package.
[update 2015–11–14] The installation procedure has evolved a bit to use a “rpkg” target in the main Makefile and improved package dependencies handling, making those instructions clearly redundant with official docs.
Assuming you are still in the mxnet directory, edit the package Makevars file :
~/mxnet @ crossgradient-1 (francois)
=> vi R-package/src/Makevars
Add a path allowing linking to fortran libraries. The last line should look like :
PKG_LIBS = $(LAPACK_LIBS) $(BLAS_LIBS) \
Now go one level above the mxnet directory, launch R on the command line and start the installation of the R mxnet package from its sources :
Rscript -e "install.packages('devtools', repo=\ 'https://cran.rstudio.com')" cd R-packageRscript -e "library(devtools); library(methods);
options(repos=c(CRAN='https://cran.rstudio.com')); install_deps(dependencies = TRUE)"cd ..make rpkgR CMD INSTALL mxnet_0.5.tar.gz
If all is well, your package should install properly, now ready to be used !
> demo(topic = "basic_bench", package = "mxnet")
At this point you should have a functional library and the bench demo should work fine.
[update 2015–11–14] The problem previously mentioned with demos has been fixed.
The package comes with a basic neural net demo that requires the MNIST dataset. You can go get them at Yann LeCun’s website :
Decompress them in a “data” folder in your R working directory and launch the demo :
> demo(topic = "basic_model", package = "mxnet")[...]> print(paste0(“Finish prediction… accuracy=”, accuracy(label, pred)))
 “Finish prediction… accuracy=0.9573”
> print(paste0(“Finish prediction… accuracy2=”, accuracy(label, pred2)))
 “Finish prediction… accuracy2=0.9573”
In my next post I will present the creation of a Docker image to perform this setup on the latest version of RStudio Server on Ubuntu trusty, stay tuned !