Create Virtualenv for Data Science Projects with One Command only

Build conda virtual environment with a Makefile template

Yuxiang Gong
4 min readMay 25, 2020
Image from Pixabay

- Do you need to help your new colleagues to set up the virtual environment?

- Do you have to check your notes or google while creating a virtual environment ?

- Have you ever been dazzled by tens of packages in the requirements.txt file?

If any of the bullets above hits you, it is worth spending 4 minutes to read this blog. Here, I will provide a Makefile template which enables you to build the virtual environment with one single command in the terminal. Additionally, I will introduce a package which could help you to collect only the top level packages into requirements.txt.

Part I:

UNIX users who have ever built and installed software packages should be no stranger to Makefile using the command make install. We can apply the same trick to our data science projects.

Let’s start with a new project.

First, create Makefile and requirements.txt in the project repository as shown in the screenshot below.

Project repository

Let’s put ipykernel and pipdeptree with the latest version (checked on May 25, 2020) in the requrements.txt file. The IPython kernel is the backend for Jupyter. The other package is used to collect the installed packages in a hierarchical manner (will be introduced later in detail).

build-conda-env

To build a conda virtual environment:

  • Assign a python version (PY_VERSION)
  • Give a name to your virtualenv (CONDA_ENV_NAME)

That’s basically all that you have to change according to your project. The variable after .PHONY is the command target which will be called in the terminal.

What if we want to remove this environment? Simply append the following lines to the Makefile.

clean-conda-env

Jupyter notebook is widely used by data scientists, thus we want to register our virtualenv into jupyter. And remove it in case we don’t need anymore.

add to/ remove from Jupyter

If we call make add-to-jupyter, the virtual environment can be registered to the jupyter notebook.

We have four commands now, although they are simple enough, we still don’t need to remember them. Let’s give them a catalog.

catalog

Now, if we simply call make in the terminal, the commands are highlighted and followed by descriptions.

To summarise, we can build up the virtualenv in following steps:

- Create a repository

- Put Makefile and requirements.txt in the repository

- Modify the Makefile according to your project

- Open a terminal, call make and check the commands

- Call any commands in the terminal by make + command name
(e.g. make build-conda-env)

Part II:

After working on a project for a while, we have installed more packages and we want to update the requirements.txt file. What people always do is calling
pip freeze > requirements.txt in the terminal. It would work, but logs tens of packages for you. Most of them are the dependencies installed automatically with the package you really need.

Let’s have a look how many packages are installed while building the virtualenv with the requirements.txt file mentioned earlier.

How many of them do you really know? Can we group them and extract only the top level packages? Sure, it turns out rather easy with pipdeptree. Let’s have a look.

The packages are grouped in a hierarchical manner, it’s clear that certifi, ipykernel, pipdeptree and wheel are the top level packages, meaning that we only have to put these packages into the requirements.txt file. Here is a shell script which updates the requirements.txt file automatically.

That’s it! Enjoy!

--

--