Simplifying Python Builds

A Single Source of Truth for install_requires

Mason Egger
Expedia Group Technology
6 min readMay 14, 2019

--

tl;dr

Why put your required dependencies in multiple locations when one will suffice? Use the following code to gather packages from your Pipfile and set them for your install_requires keyword argument in a Python setup.py.

Female Northern African Rock Python (Python sebae) brooding eggs at Tropicario — The Tropical Animal House, Helsinki, Finland. Licensed under CC BY-SA 3.0.

setup.py

import tomldef get_install_requirements():
try:
# read my pipfile
with open ('Pipfile', 'r') as fh:
pipfile = fh.read()
# parse the toml
pipfile_toml = toml.loads(pipfile)
except FileNotFoundError:
return []
# if the package's key isn't there then just return an empty list
try:
required_packages = pipfile_toml['packages'].items()
except KeyError:
return []
# If a version/range is specified in the Pipfile honor it
# otherwise just list the package
return ["{0}{1}".format(pkg,ver) if ver != "*"
else pkg for pkg,ver in required_packages]
setup(
# other keyword arguments omitted
install_requires = get_install_requirements(),
)

You will also need to add the Pipfile to your MANIFEST.in in order to build sdist.

Manifest.in

include Pipfile

Background

Dependencies are hard, and there are lots of ideas about how to handle them. In particular, libraries differ from applications on how to handle required packages. This article proposes a solution for both libraries and applications. We’ll use the standard setup.py for building Python libraries and pipenv for managing dependencies. By using pipenv, we can specify a granular approach to dependency versions, but we can also be more relaxed and allow the latest version of dependencies while recreating a working environment thanks to the Pipfile.lock.

Managing Library Requirements

Python library dependencies, also known as abstract dependencies, typically do not pin to a specific version (here is wonderful writeup) in setup.py. These abstract dependencies have to be compatible with other libraries, and by pinning them to a specific version you can create a situation where libraries that share dependencies become incompatible with each other. An example setup.py could be as follows:

setup.py

setup(
# other keyword arguments omitted
install_requires = [
"flask", # No version specified, latest will be grabbed - OK
"requests>1.0", # pull latest version, but > than 1.0 - OK
"toml>=1.0,<5.0.0", # range of acceptable versions - OK
"pylint==2.2.0", # pull specific version - Typically NOT OK
]
)

You can put requirements in setup.py as demonstrated above, but you would also need to keep these requirements in a pip requirements.txt file for setting up development and test environments.

pipenv instead of pip

pipenv is an evolution of pip that installs packages, manages both direct and transitive dependencies, and manages virtual environments. Pipenv solves the nasty problem of your dependencies' dependencies — packages not directly listed in your Pipfile—being installed at different versions between builds even when your code hasn't changed. For this reason, and others not discussed in this blog, we decided to go with pipenv.

No one wants to manage dependencies in more than one file

Consider this scenario:

I’m a developer building the next great library, my-awesome-library. I currently use pipenv to provision my developer environments. When I need a new package for the development, I add it to my Pipfile and continue about my day. This tracks my dependencies for me and even protects me from nasty transitive dependencies. These dependencies have been upgraded underneath me before and completely broken my-awesome-library. Thankfully, the Pipfile.lock prevents this until I'm ready to update these libraries. The issue here is that all of my dependencies are currently stored in my Pipfile. I need to add a section to my setup.py called install_requires to specify what external dependencies my library has. How do I solve this?

Requirements:

  • I want to have a single file to manage my dependencies
  • I want to be able to easily create an environment with these dependencies installed

Attempted Solution 1 — Rejected

Maintain my dependencies in both the Pipfile and the setup.py

This is probably the most common implementation. There isn’t anything particularly wrong with this implementation, except my dependencies are split between two files.

You can experience an error if you specify a specific version in your Pipfile...

Pipfile

[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"
[packages]
"requests" = "==1.0.0"

…and then were to specify a range in your setup.py that is incompatible...

setup.py with package range defined

setup(
# other packages omitted
install_requires = ["requests>1.0.0"],
)

If you were to run a pipenv install -e . for local testing of the package this would throw an error that wouldn't be too difficult to fix, but would probably elicit some brief profanity. Our goal is to avoid that friction.

Attempted Solution 2 — Rejected

Have my Pipfile install my package’s dependencies by installing my-awesome-library itself

With pipenv it is possible to install the library you are working on in an attempt to install the libraries' dependencies.

Pipfile with editable installation

[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"
[packages]
"e1839a8" = {editable = true, path = "."}

This can work, but I’ve had it lead to chicken-and-egg installation circular dependencies (trying to install a package that I require that I import in my setup.py).

setup.py with toml

import toml # can be any arbitrary package you needsetup(
# other packages omitted
install_requires = ["toml"],
)

With a setup.py like this, pipenv attempts to install my-awesome-library, but to run the package it needs toml, which isn’t installed yet. But we can’t run setup.py to install it without toml. This setup is likely to lead to heartaches by the number (bonus points if you get that reference). If you aren’t doing any sort of processing in your setup.py that requires external libraries, then this approach would probably be acceptable. But as soon as you add an external dependency, all bets are off.

More importantly, I am still managing two files for all of my dependencies, and that just won’t do.

Attempted Solution 3 — Accepted

Only have dependencies in my Pipfile and read them into my setup.py at build time

With this solution, we have to enforce these standards:

  • Only packages that are direct dependencies of the library are placed in the [packages] section of the Pipfile
  • Any other packages, tools, libraries we need for development are placed in the [dev-packages] section of the Pipfile

By adding the following code to your setup.py, the install_requires keyword argument is populated when the file is executed (i.e., at build time, at local installation):

setup.py

import tomldef get_install_requirements():
try:
# read my pipfile
with open ('Pipfile', 'r') as fh:
pipfile = fh.read()
# parse the toml
pipfile_toml = toml.loads(pipfile)
except FileNotFoundError:
return []
# if the package's key isn't there then just return an empty
# list
try:
required_packages = pipfile_toml['packages'].items()
except KeyError:
return []
# If a version/range is specified in the Pipfile honor it
# otherwise just list the package
return ["{0}{1}".format(pkg,ver) if ver != "*"
else pkg for pkg,ver in required_packages]
setup(
# other keyword arguments omitted
install_requires = get_install_requirements(),
)

By doing this, we can specify all our dependencies in the Pipfile, including specific versions and ranges. If no version is specified (the * in the Pipfile) then no version will be specified in install_requires.

Pipfile

[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"
[packages]
"requests" = "==1.0.0"
"toml" = "*"

You will also need to add the Pipfile to your MANIFEST.in in order to build sdist.

MANIFEST.in

include Pipfile

Now you have a single file that contains your library dependencies that can also be used to create a development environment.

If you need other packages for development, testing, or linting, you can add them to the dev-packages section:

dev-packages

[dev-packages]
"pylint" = "*"
"tox" = "*"

Conclusion

A single source of truth for managing our Python projects has made the development process noticeably smoother. Using pipenv still allows us to determine and guarantee the same version of a library is installed every time. And since we store the Pipfile.lock in version control, it is easy to roll back to a previous version if dependency creep rears its ugly head. I no longer have to worry about package dependencies and can focus my full attention on my code. Which is how it should be.

--

--