Getting Started with Conda or Poetry for Data Science Projects
How to manage virtual environments, packages, and dependencies in Python and start your data science project with Conda or Poetry.
In this post, you will find:
- A brief introduction to virtual environments, packages, and dependencies in Python,
- Python management systems,
- A parallel between Conda and Poetry,
- Conda and Poetry dockerization, and
- Conclusions.
When we start a data science project, we need to understand the importance of maintaining and managing the dependencies and packages in each project. The question is always, what tools to use to guarantee the stability and reproducibility of our work?
Currently, we have different tools that help us and allow us to create isolated virtual environments with the aim of:
- Avoiding installing various libraries on our computer, which we could only use in a single project,
- Resolving version conflicts,
- Managing multiple versions of the same package in different projects, and
- Having reproducible code on any computer and by anyone.
Next, we will present some of these tools and what their purpose are. Then we will make a parallel between the two tools currently most used by developers.
Python Management Systems
Below we will list some of the different technologies available for the Python package, virtual environment, and dependency management systems.
It is essential to understand that each technology has a different purpose in each project.
The following list is based on the official documentation for each technology:
Package manager for Python
- pip is the package installer for Python. You can use it to install packages from the Python Package Index and other indexes.
Environment manager for Python
- venv module provides support for creating lightweight “virtual environments” with their own site directories, optionally isolated from system site directories.
- virtualenv is a tool to create isolated Python environments.
- pipenv is a tool that aims to bring the best of all packaging worlds (bundler, composer, npm, cargo, yarn, etc.) to the Python world.
Package and dependency manager for Python
- poetry is a python dependency management and packaging made easy. Poetry comes with all the tools you might need to manage your projects in a deterministic way.
Package, dependency, and environment manager
- conda is a package, dependency, and environment management for any language — Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN, and more. Also known as Anaconda or Miniconda.
Conda and Poetry
Conda and Poetry stand out for currently being the most complete and most used tools by developers.
On the one hand, Poetry is a python dependency management and packaging for Python. On the other hand, Conda is a package, dependency, and environment management for any language.
Next, we will present a parallel between the two tools. Then, how to implement these environments in Docker, and finally, we will give some conclusions.
Conda and Poetry Setup
To begin, I will present an example of a configuration file for each virtual environment.
In the case of using Conda, the developer will need to create a file for the different dependencies such as development and production.
- Example of a Conda dependencies
environment.yml
file:
- Example of a Conda dev dependencies
environment-dev.yml
file:
Already in Poetry, the developer can consolidate all the dependencies in a single configuration file.
- Example of a Poetry dependencies
pyproject.toml
file:
⚠️ Try to set in a specific version and not use > or ^ to define the package version in either script (Conda or Poetry).
Comparison between Conda and Poetry
The following table presents a parallel highlighting some points such as installation, most frequent commands, pros, and cons:
* You could use Cookiecutter to create a project structure.
* * Poetry will read the
poetry.lock
file. If this file doesn’t exist, Poetry will check thepyproject.toml
and, it will generate thepoetry.lock
file.
Dockerizing Conda and Poetry — Best practices
- Docker in stages: build and runtime,
- The runtime container should be as clean and level as possible, and
- We don’t need Poetry and Conda in the runtime container.
Conda Dockerfile
Poetry Dockerfile
Conclusion
In this post, we talked about different Python virtual environments, packages, and dependency resolver tools.
Choosing one of these tools will depend a lot on the type of project to be carried out since each tool has its advantages and disadvantages.
If your project implements only Python code, Poetry might be a good place to start. On the contrary, if your project requires the implementation of other languages, it is best to use Conda.
References
made with 💙 by mafda.