Best Practices for Python Dependency Management

Written by Paul Kernfeld

Dependency management is like your city’s sewage system. When it’s working well, it’s easy to forget that it even exists. The only time you’ll remember it is when you experience the agony induced by its failure.

Here’s what we want to accomplish with dependency management at Knewton:

  • Builds should be stable across environments. If a project builds on my machine, it should build on others’ machines and on our build server.
  • Builds should be stable over time. If a project builds now, it shouldn’t break in the future.1
  • Anyone at Knewton should be able to easily download, build, and make changes to any Knewton project.
  • We should be able to have many different projects with large dependency trees without running into dependency hell.

The items below reflect how we do Python dependency management at Knewton. You may not need everything in this list, so items are introduced in order of increasing complexity.

Easily install your dependencies with pip

When you want to use a Python library from your code, you’ll need to download the library, put it somewhere on your computer, and possibly build any external routines (e.g., C, C++, Fortran!?) that the library uses. It’s possible to do this all by hand, but there’s a much better way: pip.2 Pip is a Python tool that specializes in installing Python packages. For example, just run pip install numpy to install numpy and its dependencies. Pip also helps you to keep your version control repositories small by giving you a reproducible way to install packages without needing to include them in your source code repo.

Not only does pip let you install normal source packages, but it can also install packages from source control repos, wheels, and legacy binary distribution formats.

The instructions for installing Python from The Hitchhiker’s Guide to Python will also tell you how to install pip.3 Pip’s user guide is a good way to get started with using pip, and the pip install documentation is helpful if you need to dive deeper.

Pin your requirements with a requirements.txt file

It’s easy to get a Python project off the ground by just using pip to install dependent packages as you go. This works fine as long as you’re the only one working on the project, but as soon as someone else wants to run your code, they’ll need to go through the process of figuring which dependencies the project needs and installing them all by hand. Worse yet, if they install a different version of a dependency than you used, they could end up with some very mysterious errors.

To prevent this, you can define a requirements.txt file that records all of your project’s dependencies, versions included. This way, others can run pip install -r requirements.txt and all the project’s dependencies will be installed automatically! Placing this file into version control alongside the source code makes it easy for others to use and edit it. In order to ensure complete reproducibility, your requirements.txt file should include all of your project’s transitive (indirect) dependencies, not just your direct dependencies. Note that pip does not use requirements.txt when your project is installed as a dependency by others — see below for more on this.

SAMPLE FILE

requests==2.3.0
six==1.4.1

The pip user guide has a good section on requirements files.

Isolate your Python environments with virtualenvs

As a result of how Python paths work, pip installs all packages globally by default. This may be confusing if you’re used to Maven or npm, which install packages into your project directory. This may seem like an irrelevant detail, but it becomes very frustrating once you have two different projects that need to use different versions of the same library. Python requires some extra tooling in order to install separate dependencies per-project.

project_1 and project_2 depend on different versions of the requests library. This is bad because only one version of requests can be installed at a time.

project_1 and project_2 depend on different versions of the requests library. This is bad because only one version of requests can be installed at a time.

The solution for this problem is to use virtual environments. A virtual environment consists of a separate copy of Python, along with tools and installed packages. Creating a virtualenv for each project isolates dependencies for different projects. Once you have made a virtualenv for your project, you can install all of that project’s dependencies into the virtualenv instead of into your global Python environment. This makes your setup look more like something you would create with Maven.

Now you can install a different version of requests into each virtualenv, eliminating the conflict.

Now you can install a different version of requests into each virtualenv, eliminating the conflict.

I try to keep the number of packages I install to a minimum, both in my global Python environment, and in each virtualenv. I’ll be doing a follow-up post on how to handle virtualenvs with large numbers of packages installed.

A good virtualenv tutorial is A non-magical introduction to Pip and Virtualenv for Python beginners. The Python Packaging Guide provides a high-level overview that ties together pip and virtualenvs.

Build and rebuild virtualenvs easily with tox

Now that you’re using virtualenvs for all your projects, you’ll want an easy way to build the virtualenv and install all the dependencies from your requirements.txt file. An automatic way to set up virtualenvs is important for getting new users started with your project, and is also useful for enabling you to quickly and easily rebuild broken virtualenvs.

Tox is a Python tool for managing virtualenvs. It lets you quickly and easily build virtualenvs and automate running additional build steps like unit tests, documentation generation, and linting. When I download a new Python project at Knewton, I can just run tox, and it’ll build a new virtualenv, install all the dependencies, and run the unit tests. This really reduces setup friction, making it easy to contribute to any Python project at Knewton.

A tox.ini file at Knewton might look something like this:

[tox]
envlist=py27 # We use only Python 2.7
indexserver =
# We host our own PyPI (see below)
default = https://python.internal.knewton.com/simple

[testenv]
deps =
-rrequirements.txt # Pinned requirements (yes, no space)
commands=
pipconflictchecker # Check for any version conflicts
py.test . {posargs} # Run unit tests

Get started with tox at its home page.

Indicate transitive dependencies using install_requires

At some point, you may want to package your Python project with sdist or as a wheel, so that others can depend on it by installing it with pip. Dependency management gets a bit more complicated at this point, because pip actually doesn’t look at your requirements.txt file when installing your packaged project.

Instead, pip looks at the install_requires field in setup.py, so you should be sure to fill this out in order to make a project that others can easily install. In contrast to requirements.txt, this field should list only your direct dependencies. Although requirements in requirements.txt should generally be pinned to exact versions, requirements in install_requires should permit the largest possible ranges. If you’d like to understand these differences, “The Package Dependency Blues” does a great job of explaining requirements.txt and install_requires.4

The way tox handles requirements.txt and install_requires can be a bit confusing. First, tox installs requirements from the deps section of tox.ini. Then tox runs python setup.py install, which will install dependencies from your install_requires. Since your requirements.txt file should contain a superset of the packages in your install_requires, this second step should not install any requirements if you’ve filled out your deps section correctly.

Of course, now you have two different lists of requirements to maintain. If only there were a simple way to do so! Pip-compile, from pip-tools, is the most promising tool for keeping your requirements.txt and install_requires in sync. It’s not yet fully mature, but it’s very helpful for projects with many transitive dependencies.

Specify which versions of Python tools you want to support

If you’re using pip, virtualenv, and tox, then anyone with those tools should be able to build your project, right? Unfortunately, the answer is, “almost.” If someone is running a different version of pip, virtualenv, or tox, their build may work differently than yours. As an example, tox 1.x passes all environment variables through to the commands it’s running, but tox 2.x runs its tasks in an environment with only a whitelist of environment variables. This means that, if you had a script that tried to read the $EDITOR environment variable, it might work fine when built with tox 1.x, but fail with tox 2.x.

At Knewton, we take the approach of restricting the allowed versions of these tools. We have a script called “Python Doctor” that will check your versions of Python, pip, virtualenv, and tox to ensure that they’re within our band of accepted ranges.

For an open source project, this is a little more complicated because you can’t restrict the versions of the tools running on your contributors’ workstations. In this case, it’s a good idea to mention the versions of these tools with which your project can be built.5 Note that this only applies to tools that are installed in your global Python environment, which will not appear in your requirements.txt or install_requires. For example, tox or pip would not generally appear in a requirements.txt file.

Example README snippet:

To build this project, run `tox -r`. This project has been tested with tox >=1.8,<2. If you want to make your own virtualenv instead, we recommend using virtualenv >=13.

Control your packages with a PyPI server

By default, pip will install packages from the python.org pypi server. If you work at a place with proprietary code, you may wish to run your own PyPI server. This will allow you to install your own packages as easily as those from the main PyPI server.

It’s actually much easier to set this up than you might think: your PyPI server can be as simple as an HTTP server serving a folder that contains sdist’ed tarballs of your Python project!

By hosting your own PyPI server, you can make it easy to maintain forked versions of external libraries.

You can also use a PyPI server to encourage consistent builds and reduce version conflicts by limiting the ability to add new libraries to your organization’s PyPI server.

Learn more about setting up a PyPI server here.

Examples

I’ve added to Github two Python project templates that illustrate how to tie all of this together:

Conclusion

This is our strategy, but you’ll probably need to modify it to suit your own circumstances. Additionally, the Python community has been growing quickly recently, so it’s likely that some of these practices will be replaced in the next few years. If you’re reading this in 2018, hopefully there will be some easier ways to manage Python dependencies!

Notes

  1. If you’re used to other dependency management systems, this may sound trivial. With Python, it’s not!
  2. “Pip” stands for “pip installs packages.” Easy_install was formerly used for this, but nowadays pip is superior.
  3. Pip is now included with Python 2 versions starting with 2.7.9, as well as Python 3 versions starting with 3.4.
  4. A nagging aside: make sure to follow semantic versioning to make it easier for other projects to restrict the version of your project in their install_requires.
  5. If you want to take this to the next level, you can specify your build tools programmatically too! Make a file called requirements-meta.txt that contains pinned versions of your build tools like tox. Then you’ll have a two-step build process:
  6. Install your per-project build system. To do this, use your global tox or virtualenvwrapper to make a virtualenv with this pinned version of tox in it.
  7. Use your per-project build system to build your project. To do this, run the tox that you just installed to run the project’s primary builds. If you understood this, great job!