Python Package Management: A Guide to Avoid Dependency Conflicts

Hugo Perrier

Published in

MAIF Data Design Tech etc.

6 min readMar 19, 2024

Oops, the latest pandas version requires ‘numpy>1.22’ but tensorflow requires ‘numpy~1.9’.

A pandas, a tensorflow and a numpy fighting each other with a sword in the style of a comic book.

Motivations
What are conflicting dependencies?
Understand the issue with a code example
Design Patterns: Decoupling for Flexibility
Optional Dependencies: Tailored for Users
tox: Testing through the Maze of Dependencies
Conditional Test Execution: Making Tests Smarter
Wrap-up
Conclusions

Motivations

As an open-source Python package maintainer and a Data Scientist, I’ve had the chance of witnessing the evolution of Melusine, a machine learning-powered email processing tool developed by MAIF.

Released in 2019, Melusine quickly became an integral part of our daily operations, accelerating our email processing workflows. But, as the package matured and new features were added, managing dependencies has become a real challenge.

The ever-changing Python ecosystem demanded continuous updates and maintenance of Melusine’s dependencies. This manual process is time-consuming and error-prone. Outdated dependencies could introduce security vulnerabilities, compatibility issues, and disruptions in our internal systems.

Rather than attempting to patch up the existing Melusine codebase, we opted for a complete rewrite. This approach allowed us to address the dependency conflict challenges head-on, adopting a more modern and streamlined strategy.

What are conflicting dependencies?

As a Python developer, you’ve probably experienced dependency hell at some point. It’s the frustration of trying to keep your packages up to date without breaking your code. One package requires a newer version of a dependency, but another package requires an older version. It’s a mess.

In this article, I’ll show you how we used design patterns, optional dependencies, and the package tox to keep the Melusine package clean and up to date.

Understand the issue with a code example

To illustrate the journey of rewritting Melusine, I’ll use a simple pseudo code example: a class to make a machine learning prediction using different types of models.

# predictor.py
import sklearn
import tensorflow as tf

class Predictor:
   def __init__(self, model):
     self.model = model

   def predict(self, data):
     # Tensorflow model
     if isinstance(self.model, tf.Model):
       result = self.model.predict(data)
  
     # Sklearn model
     elif isinstance(self.model, sklearn.TransformerMixIn):
       result = self.model.transform(data)
  
     # Unsupported model
     else:
       raise TypeError(
         f"Object of type {type(self.model)} "
         "is not supported by the Predictor class"
       )
  
     return result

# run_prediction.py
from sklearn.ensemble import RandomForestClassifier
from my_package import Predictor

X = some_data
predictor = Predictor(model=RandomForestClassifier())
result = predictor.predict(data=some_data)

The class uses an ML model to make a prediction, but depending on the type of model, the method to run predictions is different (transform or predict). There are a few weaknesses in this code block that could make it hard to maintain over time:

The design forces you to modify the code and create a new if condition for each new type of model. This is particularly problematic if you don’t have the rights to modify the source code (using an open source package for example).
Both sklearn and tensorflow are imported in the module. This means that they both need to be installed, even if one is not used, and could lead to incompatibilities

Design Patterns: Decoupling for Flexibility

The first step in our journey consisted in refactoring the code, leveraging the power of design patterns. Specifically, we adopted dependency injection, a technique that decouples our code from its dependencies, allowing us to swap out different dependencies without disrupting the overall system.

Let’s rewrite the code block using dependency injection. We start with defining an abstract class to fix a signature for all predictor objects.

# base_predictor.py
from abc import ABC, abstractmethod

class BasePredictor(ABC):
    @abstractmethod
    def predict(self, data):
        """Execute a machine learning model prediction"""
        raise NotImplementedError()

Then we define a class inheriting from BasePredictor for each type of model.

# sklearn_predictor.py
from base_predictor import BasePredictor
from sklearn.ensemble import RandomForestClassifier

class SklearnPredictor(BasePredictor):
    def __init__(self):
        self.model = RandomForestClassifier()

    def predict(self, data):
        """Execute a machine learning model prediction"""
        return self.model.transform(data)

# tensorflow_predictor.py
from base_predictor import BasePredictor
from tensorflow import SomeTensorflowModel

class TensorflowPredictor(BasePredictor):
    def __init__(self):
        self.model = SomeTensorflowModel()

    def predict(self, data):
        """Execute a machine learning model prediction"""
        return self.model.predict(data)

Finally, we instantiate a predictor object (it can be any class inheriting from BasePredictor) and use it to make a prediction.

# run_prediction.py
from my_package.sklearn_predictor import SklearnPredictor

X = some_data
predictor: BasePredictor = SklearnPredictor()
result = predictor.predict(data=some_data)

This refactored code improves a lot the code maintainability:

New types of model can be added easily without impacting the existing code. Users can just create a class inheriting from BasePredictor and use it right away.
Dependencies are independent from each other. The tensorflow package doesn’t have to be installed when running an SklearnPredictor (The SklearnPredictorand TensorflowPredictor are defined in different modules).

Optional Dependencies: Tailored for Users

Instead of forcing all users to install all dependencies, Melusine provided optional dependency installation options. This allows users to choose the dependencies they needed based on their specific use cases, reducing the overall package size and simplifying the installation process.

In our example, we just want to install one of tensorflow or sklearn. Optional dependencies can be setup in the pyproject.toml file.

# pyproject.toml
[project]
name = "melusine"
dependencies = ["pandas==2.0.0"]

[project.optional-dependencies]
sklearn = ["sklearn==1.3.2"]
tensorflow = ["tensorflow==3.2.0"]

Pandas is set as a mandatory dependency, it will always be installed when running pip install my_package , while sklearn and tensorflow are optional dependencies installed only when running pip install my_package[sklearn] and pip install my_package[tensorflow] respectively.

tox: Testing through the Maze of Dependencies

With the code refactored and dependency management streamlined, Melusine faced a new challenge — ensuring that the package works seamlessly with different dependency configurations. To address this challenge, the team turned to the tox testing framework.

Tox is a tool that can help you test your Python packages with different versions of dependencies. This can help you catch dependency conflicts before they cause problems for your users.

Once you have created and configured a tox.ini file, you can run tox to test your package with all of the specified versions of dependencies. If there are any dependency conflicts, tox will report them.

# tox.ini
[tox]
requires = tox>=4
env_list = base, sklearn, tensorflow

[base]
description = run unit tests with the base dependencies
commands = pytest
deps = pytest

[sklearn]
description = run unit tests with the sklearn dependencies
commands = pytest
deps = pytest
extras = sklearn

[tensorflow]
description = run unit tests with the tensorflow dependencies
commands = pytest
deps = pytest
extras = tensorflow

This file creates testing environments: base, sklearn and tensorflow.

Conditional Test Execution: Making Tests Smarter

The last challenge we needed to tackle was to skip the tests requiring tensorflow when using the base and sklearn environments. This can be done simply with the pytest.importorskip command.

# test_tensorflow_predictor.py
import pytest
tensorflow = pytest.importorskip("tensorflow")
from my_package.tensorflow_predictor import TensorflowPredictor

def test_tensorflow_predictor():
    predictor = TensorflowPredictor()
    assert predictor.predict(some_data) == expected_result

The pytest.importorskip command checks if the tensorflowpackage is installed and, if not, skips the following tests.

Wrap-up

The strategy we adopted to avoid dependency conflicts when rewritting melusine was:

Reformat the code to use dependency injection
Set up optional dependencies in the package requirements
Configure tox to use multiple testing environments
Use pytest.importorskip to make test execution conditional to the installed packages

Conclusions

Dependency conflicts are a real problem for package maintainers. However, using design patterns, optional dependencies, and tox, can help keep your Python packages clean and up to date.

The adoption of these enhanced code design principles in Melusine v3 has significantly transformed it’s maintainability. Developers can now focus on their specific areas of expertise, working independently on modules and components without the risk of impacting each other. This specialization has accelerated development and improved code quality.

Melusine’s journey from a complex codebase to a well-maintained open-source package highlights the importance of effective code design and modularity. By adopting a more structured and component-based approach, we have made it a significantly more robust and reliable for MAIF and the wider open-source community.

About the author

I grew up in the French Alps, studied Physics / Nuclear Engineering in Switzerland, Sweden and England, and I’ve been working as a Data Scientist since 2018 at Quantmetry and MAIF. I am also a big fan of wakeboarding and board games :)

Follow me on LinkedIn.
Leave a star for Melusine on GitHub!