Managing Conflicts with Mamba
A key aspect of package management is finding a set of compatible versions of the required packages. Indeed, even in the case when few packages are explicitly required, second-order dependencies may impose constraints on their requirements, making the global resolution a complex satisfiability problem (SAT).
The exploding complexity of the satisfiability problems arising in package management prescribes the use of specialized SAT solvers.
A major pain point in the Conda ecosystem has been the troubleshooting of cases where the desired packages cannot be installed because no solution to the SAT problem exists. For example, if a user tries to install PyTorch with Python=2.7 (which is impossible because PyTorch is only available for Python 3), Conda will output a rather obscure error message. Today, we are proud to announce that mamba can produce in-depth diagnosis for version conflicts, enabling the user to understand such errors more easily. This new feature can be enabled by setting experimental_sat_error_message: true
in ~/.condarc
.
Understanding Conflicts
When we ask a package manager to install a package with loose requirements, the tool must decide which version to get. Each package installed also has some more or less loose dependency requirements leaving more decisions to be made by the tool. We could pick the most recent version every time, but when two packages (or more) depend on different versions of the same software, things get difficult to track. We can try other versions, but if that does not work, we have to try other versions of the packages that required them and so on. If it starts to feel like filling a Sudoku, it’s because it is. Solving this satisfiability problem is NP-complete.
The problem gets even bigger once we consider variants. For a given version, binary (i.e. compiled) software is built for different settings, such as OS and CPU architecture, but also features like GPU and other binary library versions (since they need a compatible binary interface). The first two are easy to filter because there is only one possibility for a given computer, but the others are part of the complexity associated with binary dependencies (the topic could be a series of articles on its own).
The new message is telling us that Python=2.7 and PyTorch cannot be installed together:
- Python=2.7 could be installed, and doing so would require one of
python_abi
with version 2.7.¹ - Following this, PyTorch cannot be installed and the message lists all versions/variants that were considered. The first five group versions were built with Python variants from 3.6 to 3.10, and therefore incompatible with
2.7
. The last alternative is a Cuda (GPU) variant that is not available on this machine.
From there, we can infer that a possible course of action is to upgrade our Python dependency to some version greater or equal to 3.6.
Implementation
Under the hood, Mamba ruses Libsolv for the aforementioned SAT solving of the environment dependencies. The algorithm can find a solution when one exists and otherwise proves that there is none. This proof takes the form of an unstructured dependency graph, tracing the packages from user-specified requirements, down to problematic packages (a subset of the packages involved in the solving process). The first step for us was to structure that information into a rich graph structure, with each node associated with a package and each edge associated with a dependency requirement.
For the relatively small example used here, the graph has about 370 nodes. This is because both Python and PyTorch have a large number of versions and variants. This is too much to realistically compose a meaningful error message. The next step was therefore to compress this graph by merging packages that play the same role in explaining the error. For example, python=2.7.0
and python=2.7.1
both require a python_abi
version 2.7
, which is the source of the conflict. Similarly, all cuda
builds of pytorch
are not installable for a shared reason. Compression is achieved by merging nodes with the same package name that have the same predecessors and the same successors in the graph, and are not in conflict. This is a criterion we can change to provide more or less detailed error messages.
Finally, the last step is to turn the graph into the previously presented message. This is based on a depth-first traversal of the graph, starting from the user requirements. During that traversal, we collect positional information needed to write proper sentences. When the search reaches a leaf, we check for installability status, such as a conflict or missing dependency, and propagate it to the parents to create the installable / not installable wording (and color) in the final message.
Caveats and Possible Improvements
The strength of our approach is that we rely on the input provided by libsolv
without infusing additional knowledge about the packages. This was an efficient way to get reliable results, without chasing all possible edge cases. The trade-off is that we lose control in our algorithm over the error proof given by libsolv
, should there be multiple. The proof depends on the SAT resolution process, where it is difficult to distinguish what error is most relevant. For instance, in this example, it turns out that even if cuda
was available on our machine, the pytorch
cuda
variants are still not built for python=2.7
. Getting a GPU and cuda
would then bring that next problem to the user. If the lack of python=2.7
variant had been selected by libsolv
, the problem would have an even simpler explanation.
The next difficulty is finding a good representation of the packages. Names and versions are easy to understand, but variants are handled differently in Conda. The information is sometimes put in the package dependencies (and hashed in the build string) and sometimes put directly in the build string. As a result, always displaying build strings leads to a lot of noise, while omitting it could remove critical information (e.g. in python_abi * *_cp27mu
). The convention for build strings can also change through time. All these factors make retrieving readable variant information difficult.
The last difficulty is to represent conflicts for the user. When a dependency is missing, as with __cuda
, it is an isolated problem that directly forbids installing this package. Conflicts are different. Each package could be installed individually, but only their joint installation causes a problem. In these new messages, we force a storyline, claiming that the first package of a conflict that is encountered (python=2.7
) can be installed, while the subsequent ones (python
3.6
to 3.10
) cannot. The reverse, that we can install pytorch
but not python=2.7
, is also a valid explanation not presented to the user. The alternative presented may lead the user in one direction or another.
About the Authors
Antoine Prouvost is a Scientific Software Engineer at QuantStack. Previously, Antoine was researching Machine Learning and Combinatorial Optimization at the DS4DM research chair and Mila.
Claudia Rogoz is Software Engineer at Palantir and a Mamba contributor interested in seamless user workflows. She likes exploring mountains, bookshelves and the world of tech.
Acknowledgments
Antoine’s work at QuantStack was made possible thanks to a grant from the Chan Zuckerberg Initiative awarded to the Conda-Forge project as part of the essential open-source software for science grant cycle 4 (EOSS-4). Claudia was able to contribute as a Palantir employee.
We are grateful to the libsolv
maintainers for providing valuable insights along our journey.
Some ideas in this post were inspired by a PubGrub publication by Natalie Weizenbaum. For comparison, here is our output on the toy example used there.
¹ The fact that there are two options here is due to a change in the way conda-forge names this package.