Getting Started with Conda or Poetry for Data Science Projects

fernanda rodríguez
Semantix
Published in
4 min readNov 3, 2021

How to manage virtual environments, packages, and dependencies in Python and start your data science project with Conda or Poetry.

Photo by Victoriano Izquierdo on Unsplash

In this post, you will find:

  • A brief introduction to virtual environments, packages, and dependencies in Python,
  • Python management systems,
  • A parallel between Conda and Poetry,
  • Conda and Poetry dockerization, and
  • Conclusions.

When we start a data science project, we need to understand the importance of maintaining and managing the dependencies and packages in each project. The question is always, what tools to use to guarantee the stability and reproducibility of our work?

Currently, we have different tools that help us and allow us to create isolated virtual environments with the aim of:

  • Avoiding installing various libraries on our computer, which we could only use in a single project,
  • Resolving version conflicts,
  • Managing multiple versions of the same package in different projects, and
  • Having reproducible code on any computer and by anyone.

Next, we will present some of these tools and what their purpose are. Then we will make a parallel between the two tools currently most used by developers.

Python Management Systems

Below we will list some of the different technologies available for the Python package, virtual environment, and dependency management systems.

It is essential to understand that each technology has a different purpose in each project.

The following list is based on the official documentation for each technology:

Package manager for Python

  • pip is the package installer for Python. You can use it to install packages from the Python Package Index and other indexes.

Environment manager for Python

  • venv module provides support for creating lightweight “virtual environments” with their own site directories, optionally isolated from system site directories.
  • virtualenv is a tool to create isolated Python environments.
  • pipenv is a tool that aims to bring the best of all packaging worlds (bundler, composer, npm, cargo, yarn, etc.) to the Python world.

Package and dependency manager for Python

  • poetry is a python dependency management and packaging made easy. Poetry comes with all the tools you might need to manage your projects in a deterministic way.

Package, dependency, and environment manager

  • conda is a package, dependency, and environment management for any language — Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN, and more. Also known as Anaconda or Miniconda.

Conda and Poetry

Conda and Poetry stand out for currently being the most complete and most used tools by developers.

On the one hand, Poetry is a python dependency management and packaging for Python. On the other hand, Conda is a package, dependency, and environment management for any language.

Next, we will present a parallel between the two tools. Then, how to implement these environments in Docker, and finally, we will give some conclusions.

Conda and Poetry Setup

To begin, I will present an example of a configuration file for each virtual environment.

In the case of using Conda, the developer will need to create a file for the different dependencies such as development and production.

  • Example of a Conda dependencies environment.yml file:
  • Example of a Conda dev dependencies environment-dev.yml file:

Already in Poetry, the developer can consolidate all the dependencies in a single configuration file.

  • Example of a Poetry dependencies pyproject.toml file:

⚠️ Try to set in a specific version and not use > or ^ to define the package version in either script (Conda or Poetry).

Comparison between Conda and Poetry

The following table presents a parallel highlighting some points such as installation, most frequent commands, pros, and cons:

* You could use Cookiecutter to create a project structure.

* * Poetry will read the poetry.lock file. If this file doesn’t exist, Poetry will check the pyproject.toml and, it will generate the poetry.lock file.

Dockerizing Conda and Poetry — Best practices

  • Docker in stages: build and runtime,
  • The runtime container should be as clean and level as possible, and
  • We don’t need Poetry and Conda in the runtime container.

Conda Dockerfile

Poetry Dockerfile

Conclusion

In this post, we talked about different Python virtual environments, packages, and dependency resolver tools.

Choosing one of these tools will depend a lot on the type of project to be carried out since each tool has its advantages and disadvantages.

If your project implements only Python code, Poetry might be a good place to start. On the contrary, if your project requires the implementation of other languages, it is best to use Conda.

References

made with 💙 by mafda.

--

--

fernanda rodríguez
Semantix

hi, i’m maría fernanda rodríguez r. multimedia engineer. data scientist. front-end dev. phd candidate: augmented reality + machine learning.