From soup to nuts guide for setting up a conda environment
A comprehensive guide for conda, from choosing the installer to setting up environments, channels and installing packages
Motivation:
Hello! Conda is one of the most popular tools at data science community, and yet, it can be confusing to understand the steps and the cost of implementing that step, as there is hardly a single place explains, so I decided to write one up.
I will focus on three topics, the first one is about conda installer options, Anaconda, miniconda, and miniforge, what you will be missing by not using one. The second topic will be about setting up an environment, you can reliably use for multiple projects, and how to modify when you need more configuration. And the last part is about the relationship of channels with environments and packages, which is also an ignored topic, but very important to show good engineering skills if you want to productionize your work with minimum trouble.
PS: I am on macOS Catalina 10.15.7, and have the Conda version 4.9.0. If you have questions on specific versions, please leave a comment.
TL;DR?
So, what am I promising you may get, by the end of this article would be understanding how to set up your conda environment through the lens of the opportunity cost by choosing between:
- miniconda and miniforge installers
- naming environments as unique and standard
- adding channels globally and specific to your environment
- installing packages from different channels
Hope you enjoy, and I’ll see you at the end!
Conda
conda is an open-source, cross-platform, package, dependency and environment management tool for -in theory- any language (but mostly supported on Data Science and Machine Learning specific languages, such as Python, R, Ruby, C/C++, FORTRAN, …). Anaconda is the company who developed it first, then open-sourced under the BSD license. It has more functionality than both pip and virtual env together can offer. Pip is a package manager on top of Python for a limited number of libraries, i.e. it can’t install Python. Virtualenv is a simple environment manager, can’t install packages at all… I guess you are convinced to use conda at this point, so we can continue!
The top three installers for installing conda are Anaconda, miniconda and miniforge. The first two are developed by Anaconda, and available on their website, whereas miniforge is created by the community recently as miniconda does not have any not support foraarch64
architecture.
If you need specific requirements, like running your models on aarch64(arm64) or ppc64le (POWER8/9) architectures, you should use miniforge installation. It has also support for PyPy, a light version of Python. During the installation, it sets conda-forge
as the default -and the only- channel and does not have the defaults
channel. We will talk about channels on the next part more in detail.
Another reason to use miniforge is the commercial use restriction(*) on Anaconda/miniconda is due to a recent change on Term Of Service of Anaconda,
where it is declared as a violation to use the Repository for commercial activities, which includes usage of the packages installed from the defaults
channel.
Also, there are a couple of open-source projects already moved from mini-conda to miniforge, this and this, which suggests there will be an increase at the community supporting this repo.
If you are happy with ToS and don’t need the architecture requirements, we are left with 2 options, Anaconda or miniconda. If you want the full version which provides a one-off installation, and if you have 5GB disk space, Anaconda will install Python + 250 packages for you. It may* be a good choice for new starters, as it has commonly used packages ready to use, as well as applications such as Anaconda Navigator, where you can launch your JupyterLab, pySpider IDE for your environments. Anaconda also has multiple editions, i.e. Individual edition is the free version, Enterprise edition is for your team if you want to extend and manage the customisation of packages and channels privately.
And our last option is the minimal installer miniconda
will be a good choice, as it will set up defaults
channel. *It is better than Anaconda installation because you learn more about which packages to download and you don’t have to free up 5 GB.
PS: If you think you are missing out Anaconda Navigator, Spyder you can install with conda, the first one is available in the default channel, and Spyder is available on both.
$ conda install anaconda-navigator
$ conda install spyder
🛑 There are also different methods to run our installers:
- Local: Depending on your environment both installers offer different releases: either as a package (i.e. exe for Windows platforms), as a script (for Linux platforms) or both (pkg** and sh for macOS platforms). ** miniconda only, miniforge does not support it (yet).
- Cloud VM: Only
miniconda
has an offer of AMI images for AWS if you want to isolate your environment and run on the cloud. - Container: Docker installers are available for both: miniconda and miniforge
- CI/CD Pipeline: Both miniconda and miniforge have Github actions for multiple packages it support and it can take the hard work off from your shoulders.
Environment Creation
Conda environment
is an abstract way of organising multiple packages and their dependency together. Any new environment we create has a directory where all the packages will be downloaded to, and any configuration, history related to this environment will be stored. If you have installed the Anaconda’s installer, it creates an environment called base
on the anaconda
's installation directory. You can check this out by running conda env list
command: (*) refers to the default environment (when we are not using any environments actively).
$ conda env list
# conda environments:
#
base * /Users/ebrucucen/opt/anaconda3
It’s all good, but we want our very own environment. There are 2 conventions on how to name your environments.
First one is you can give unique names to your environment. This implementation creates an environment for each version of Python (as it is the main language we are all interested, right?), such as conda-py37
or env-py3.8
for Python 3.7 and Python 3.8 versions respectively. This is great, if you are new to environments, possibly you won’t have many versions of Pythons, with multiple packages with complex dependency trees. You can access your environment from any project folder, and follow the conda documentation when executing the commands involving environments without any issues as the environment will be set up on the standard locations (/user/.../envs/)
To create an environment, we use conda create
command, followed by the environment name, and a list of package=version pairs, where versions are optional, with the tradeoff installing the latest versions.
$ conda create --name env-py3.8 python=3.8 numpy=1.19.5
The second option is using common name to all of your environments, and create a new one for each project folder, such as conda-env.
This means, for any project folder you are working on, you can reference the environment name the same way, and use in any on your automation scripts in a consistent manner. Please note, the subcommand --prefix
is exclusively mutual with the--name
pick your poison carefully!
$ conda create --prefix /<possibly a long path>/conda-env python=3.7# or if you are already on the same directory:$ conda create --prefix ./conda-env python=3.7
Environment Activation
Whichever naming convention you have chosen, now you have an environment, (either env-py3.8 or conda-env or both(!)), and next thing we need to do is to activate, so we can start installing packages to these environments:
$ conda activate env-py3.8
which (by default) displays environment name on the command-line, ready to take next set of instructions….
(env-py3.8) $
or
$ conda activate ./conda-env
results in not so pretty display, and definitely not a great use of your space
(/<possibly a long path>/conda-env) $
To change this behaviour to display simply the environment name, you can modify .condarc
file (which is by default on your home directory,~/.condarc, if you are not sure you can find out by conda config — show-sources
) :
conda config --set env_prompt '({name})'
Now, if you have managed to follow me to this point, we should have an environment, activated waiting for us to install packages.
If you want to breathe and have a bit deep dive, check out the subdirectories of the env folders, where conda-meta
folder contains history
file to track each action on this environment, JSON files for each package with its build number, dependencies, and file locations listed… If you found/know anything interesting, please do leave a comment, it will help us to understand the environment mystery better. We need packages, and channels
help us to get them with the right version and dependencies, let’s move next!
Channels
Channels are repositories for our packages. Each channel, maintained separately, may have a different version of the packages, different build for each version, and the same version of the packages may have different dependencies in each channel. Checkout the Stackoverflow question for more discussion on this.
A good example is for the most common two packages, NumPy and Tensorflow, where Anaconda’s defaults channel and conda-forge has different versions available.
numpy 1.19.5 py39he588a01_1 conda-forge
numpy 1.19.2 py39he57783f_0 pkgs/maintensorflow 2.0.0 mkl_py37hda344b4_0 pkgs/main
tensorflow 1.14.0 hcba10bf_0 conda-forge
To avoid one confusion, this does not mean if we want to install tensorflow 2.0.0 version referencing conda-forge channel, but it means conda will try to reach out to default channel for the tensorflow 2.0 modules available and prioritised conda-forge for each dependency, which in return you will get:
_tflow_select pkgs/main/osx-64::_tflow_select-2.3.0-mkl
absl-py conda-forge/osx-64::absl-py-0.11.0-py37hf985489_0
...
tensorboard pkgs/main/noarch::tensorboard-2.0.0-pyhb38c66f_1
tensorflow pkgs/main/osx-64::tensorflow-2.0.0-mkl_py37hda344b4_0
tensorflow-base pkgs/main/osx-64::tensorflow-base-2.0.0-mkl_py37h66b1bf0_0
tensorflow-estima~ pkgs/main/noarch::tensorflow-estimator-2.0.0-pyh2649769_0
termcolor conda-forge/noarch::termcolor-1.1.0-py_2
werkzeug conda-forge/noarch::werkzeug-1.0.1-pyh9f0ad1d_0
Add channels
As we mentioned above, the default channel for miniconda and Anaconda installations is defaults
channel, whereas for miniforge, the default channel will be conda-forge
channel. We can display our channels by looking at the configuration file (independent of the fact that whether you are in an activated environment or you are not):
$ conda config --show-sources
which results in something similar to this (yours will be slightly different):
==> /Users/ebrucucen/.condarc <==
auto_update_conda: False
ssl_verify: True
channels:
- conda-forge
- defaults
You can add channels globally, or local to your environment when you are in the activated environment. For global installation, you can call conda config add
conda-canary to test the packages to be published to live within 24 hour
$ conda config --add channels conda-canary
We can also create a channel-specific to an environment. Let’s say we want to install genomics related packages to our env-py3.8 environment, then we can activate an environment, passing the--env
argument to it, and add the bioconda channel to it:
$ conda activate env-py3.8
(env-py3.8) $conda config --env --add channels bioconda
The result will be similar to this (you may/may not have defaults channels)
(env-py3.8) $conda config --show-sources==> /Users/ebrucucen/.condarc <==
auto_update_conda: False
ssl_verify: True
channels:
- conda-canary
- conda-forge
- defaults==> /Users/ebrucucen/opt/anaconda3/envs/env-py3.8/.condarc <==
channels:
- bioconda
- defaults
Append channel
One thing you may have noticed is that the add
command modifies the config file with putting the latest additional channel to the top of the list. conda is opinionated about the order of the channels, as it is a priority list in essence. If our new channel should go to the bottom, rather than to the top, we can use the append
argument (or modify the config file, I hear you)
(env-py3.8) $conda config --env --append channels test-channel
if we check out the configuration file, we will see our channel as the last item ( yes, you are right, there is no channel validation happens during the execution of add/append channel commands, but it will error when you want to search/install packages)
(env-py3.8) $conda config --show-sources==> /Users/ebrucucen/.condarc <==
auto_update_conda: False
ssl_verify: True
channels:
- conda-canary
- conda-forge
- defaults==> /Users/ebrucucen/opt/anaconda3/envs/env-py3.8/.condarc <==
channels:
- bioconda
- defaults
- test-channel
Remove channels
If you want to remove a channel (either a mistake you have done or you don’t need it any more), is quite as simple as running the --remove
argument, and with the same principle where you need to specify the --env
tag to remove the channel from the activated environment, otherwise, conda will give an error to tell you it can’t find the channel.
(env-py3.8) $conda config --env --remove channels test-channel
Packages
And finally, we can install our packages now. To find out what you need, I recommend Anaconda search, which gives the versions and the downloads for each package, you have a better understanding of your options, plus you can find good jupyternotebooks for your packages.
Since each environment has a prioritised list of channels, any installation will checklist one by one whether the version (if specified) is available or not.
$ conda search spyder
Top 6 ways I use to install packages are: on a specific environment, of a specific version, of a specific build ( which can be gobbledygook), from a specific channel, and also with dependencies or without dependencies. For the first 3, we can use this format:
$ conda install -n <env-name> -c <channel> <package_name>=<version>=<build_string>$ conda install -n env-py3.9 -c conda-forge numpy=1.19.5=py39he588a01_1
conda install by default installs the dependant packages. We should explicitly tell we don’t want the dependencies (danger Williams, I assume we know what we are doing at this point).
$ conda install -c conda-forge numpy --no-deps
Call To Arms
Thank you for your patience, and reading through my post. Now, you have a conda set up with an environment, and channels ready to install any package (as well as Pip, and other tools) you need as I promised with the reasoning behind each choice. Hope you enjoyed it. It is a simple process, and you can customise as much as you want, such as creating your own channels, packages, will all the support available. Happy condaing!