Sane Dependency Management for Python: Migrating to Poetry

Miya
Super.com
Published in
8 min readJul 26, 2023

Authors: Alex Ianus and Ye Shao

One Thursday afternoon, a developer at Super.com needed to ship an urgent one-line hot fix to one of our Python services. Unfortunately, the simple hot fix was causing all of our tests to fail, despite being perfectly correct.

It wasn’t the first time, and it turned out to have had nothing to do with the hotfix or our code.

What happened, exactly?

The problem was that our builds were not deterministic. Specifically, the dependencies we were downloading and including from PyPi were not deterministic, and something had changed in PyPi since our last build that caused our tests to fail.

That meant none of our tests would pass until we figured out what had changed in PyPi and how to fix it. Sadly, the hotfix would have to wait.

It happens a lot when working with Pip as a dependency management tool for many reasons, which can drive a Python developer insane.

In this post, we’ll share the challenges we faced at Super.com when working with Pip for Python, why we decided to migrate to Poetry to solve the problem, and what we’ve learned after going through the process — all to help you keep your sanity in check.

How we used Pip to bundle dependencies in the past

Like most Python developers, we used requirements.txt to list all of our service’s direct dependencies.

We even specified an exact version for most of them to reduce the chances that a newly released dependency version would break our builds.

On every build (ignoring caching), we would run pip install -r requirements.txt to install all the dependencies and transitive dependencies into our docker image.

Problems with Pip, including the one that tripped us up

New developers are taught to use Pip and requirements.txt as the canonical tool for Python dependency management.

However, here are three reasons why it can be a major problem, which led us to seek another solution at Super.com.

1. Pip Transitive Dependencies Are Not Locked

Pip does not lock transitive dependencies, which is extremely frustrating.

It means that bugs and backwards-incompatible code in new upstream packages can be pulled in automatically. That’s what happened to us in the story we described earlier.

In our case, it was a transitive dependency (a dependency of one of our dependencies with an insufficient version constraint) that was causing the issue and breaking our builds.

The root cause of the failure was an innocent mistake from an open-source developer. However, we knew we were also vulnerable to malicious actions if PyPi or one of the PyPi developers were ever compromised.

A malicious open source author, who might be angry or disgruntled with the Python community, can push a new version of transitive dependency and it could automatically be included in our builds without our knowledge.

They might even remove their packages from the public repository all together.

In some cases, OSS contributors will even upload malware themselves because they want to make a political statement. In March 2022, the author of the npm package ‘node-ipc’ uploaded sabotaged versions of the library to protest Russian attacks on Ukraine.

BleepingComputer reported that “newer versions of the ‘node-ipc’ package began deleting all data and overwriting all files on developer’s machines, in addition to creating new text files with ‘peace’ messages.”

The developer aimed to “overwrite or delete arbitrary files on a system for users based in Russia and Belarus.”

Unfortunately, it caused even more chaos when “select npm versions of the famous ‘node-ipc’ library began launching a destructive payload to delete all data by overwriting files of users installing the package.”

2. Version constraints can not be enforced

Next, Pip doesn’t actually enforce the version constraints specified by us or by our dependencies.

Consider:

  • One of our services might depend on Package A and Package B
  • Package A depends on Package C > 1.0
  • Package B depends on Package C < 1.0

With Pip, the installed version of our transitive dependency Package C would be determined randomly or by the order in which A and B are included in requirements.txt.

There is no error or warning to alert us that whichever version of C is installed, it will be incompatible with at least one of the other packages in the environment.

3. Cryptographic hashes of packages are not stored or checked

Finally, Pip does not cryptographically verify that dependency code is unchanged from the time it was first added to the project.

If PyPi itself were taken over, or if our build servers were tricked into downloading packages from somewhere else, we might download malicious artifacts without noticing and deploy them in our services.

As we scaled the company from a monolith to 30+ microservices, each with tens of direct dependencies, it was clear we needed a real solution to make our Python dependency bundling deterministic and consistent with the specified version constraints.

Our Python dependency management migration process

Why Poetry for Python dependency management?

To evaluate which dependency management tool would work best, we met with our team to develop a list of priority needs versus nice-to-haves.

Our priority Python dependency management tool needs included:

  • The ability to lock dependencies, including transitive dependencies. Unless the lock file has changed, all dependencies should be exactly the same on every build — regardless of any new versions appearing in PyPi.
  • The ability to specify semantic version constraints. For example ~>1.0.2 in Ruby means 1.0.X, X >=2, ~>1.1 in Ruby means 1.X.Y where X >= 1 and Y is maximized.
  • The tool should verify that the lock file is still valid during deployment and that the desired list of dependencies (Gemfile or equivalent) hasn’t changed since the lock file was generated.
  • The ability to install dependencies directly from the lock file without having to resolve dependencies again during a build.
  • It should work well with the latest version of Pip.

Our nice-to-have wants (from most to least important) included:

  • The ability to upgrade a single dependency and its transitive dependencies if necessary while leaving the rest of the lock file alone. For example, the equivalent of a JavaScript “yarn upgrade some-package.”
  • When upgrading a single dependency or adding a new dependency to the desired list of dependencies (Gemfile or equivalent), it should take less than one minute to regenerate the lock file.
  • If transitive dependencies conflict — for example, we explicitly depend on package A or package B, and package A depends on package C ~>1.1.0, and B depends on C ~>1.2.0, we want to be able to specify that 1.2.0 is fine and to override the ~>1.1.0 constraint. This is common with Yarn for JavaScript.
  • The tool should cryptographically verify that the content packages upstream have not changed since the lock file was generated. For example, via hashes in the lock file.

Our Python dependency management tool functionality matrix

To evaluate our best options, we compared Pip to Pipenv, Poetry, and Pip-tools based on the feature needs we outlined earlier.

As you can see in the feature matrix screenshot below, Poetry met the majority of our needs, including:

  • Locks dependencies
  • Locks dependencies with hash
  • Specific semantic version constraints
  • Automatically verifies that a lock file is up to date
  • Ability to upgrade a single dependency elegantly

Unfortunately, we knew Poetry wouldn’t override version constraints declaratively. However, since it met our other priority needs, we decided to move forward with the migration process.

What to consider when migrating to Poetry

There was a lot of trial and error when migrating from Pip to Poetry, and it took us two months to complete the process.

That’s why we want to share with you some key considerations to save you time and help you have a smoother experience.

The happy path

In the happiest scenario, we simply ran a script that transforms a requirements.txt file into a pyproject.toml file. Then we changed the Dockerfile to use Poetry instead of Pip install and ran our tests in the built image to ensure nothing broke.

So on the happy path, you just list all the dependencies and copy them over from requirements.txt into pyproject.toml. Then run Poetry lock, and it just works. That’s because there weren’t any of these dependency constraints or conflicts that we described earlier.

If there are conflicts, you can usually resolve them by upgrading one package or the other in pyproject.toml.

Unfortunately, not everything migrated over from Pip to Poetry that smoothly.

The unhappy path

In the unhappy Poetry migration path, we ended up in “dependency hell.” It was difficult to find a combination of versions of our direct dependencies that wouldn’t generate any conflicts amongst each other or in transitive dependencies.

In the past, Pip would go ahead and randomly install transitive dependencies that satisfied some constraints but not others. With Poetry, we’d instead get an error whenever we ran a build (which is a good thing) and had to actually fix the problem.

Part of our fork of snowflake-connector-python, the worst offender

In the worst-case scenario, we had to fork some packages. That way, we weren’t depending on the packages directly from upstream or the author anymore.

Instead, we’d make our own copy of that code, fix whatever the problem was for the constraints (e.g., transitive dependencies that weren’t actually used or overly restrictive constraints), and rename the package.

If the package was called “my-package,” for example, we would create a new package called “super-my-package,” and publish it into our internal artifact repository.

Then we would ensure our code used the new package instead, which had fixes for the dependency bugs and constraints. That way, we knew we’d have a lock file that would always work.

Other considerations

Make sure you run enough tests during the migration process and be careful and specific about how you install Poetry.

Use a standalone installer, and follow the official Poetry installation documents to do it properly. Some of the unofficial migration blog posts we found online weren’t accurate.

In some cases, we ended up using a tilde (~) constraint. For example, if you type in ~1.2.3 as your version constraint for a given package, it means greater than or equal to 1.2.3. but less than 1.3.0.

This is a good practice that helps you easily upgrade packages to what should be a newer, safer, less buggy, but backwards compatible version.

However, if the constraint was too loose, Poetry locking became extremely slow. So keep that in mind if you plan on doing the same.

Exact constraints, like 1.2.3 are fine to use, as well. We left many in our packages as we migrated over. Remember, though, that you’re not going to automatically get the newest versions with a simple `poetry lock.`

We rely instead on other tools like Dependabot to alert us if we’re missing a security fix.

Poetry migration from Pip can be tricky but still worth it

While using Poetry as a dependency management tool will make your life easier as a Python developer, there are still a few challenges.

For example, Poetry can be slower than Pip to ensure all dependencies and constraints are followed properly. So when you update a package, sometimes it can take minutes when developers are used to it taking seconds.

However, with our builds no longer breaking in non-deterministic ways, we feel it’s been well worth the effort.

If you’re planning to embark on a Pip to Poetry migration, document the process and hold training sessions to teach others how to use it when they join your team.

Finally, if you’re new to the process or have any questions, please leave a comment.

--

--