What is Anaconda and Why should I bother about it?
In this Article we will be installing Anaconda, managing python packages, creating individual conda environments and sharing them via conda YAML file. We will be covering most of these topics in following order:
- What is Anaconda & Why Should I bother about it?
- Installing Anaconda
- Creating Environments via Conda
- Managing Packages via Conda
- Saving & Loading Environments
- Listing, Sharing & Removing Environments
- Best Practices
First of All, What is Anaconda & Why Should I bother about it?
You probably already have Python installed and will be wondering why you need this at all. Firstly, since Anaconda comes with a bunch of data science packages, you’ll be all set to start working with data. Secondly, using conda to manage your packages and environments will reduce future issues dealing with the various libraries you’ll be using.
In most of the real world Data Science projects, conda based package and environments are widely used and I personally preferred conda based package installation and maintenance of project then installing and maintaining directly PIP based packages.
So, Why Anaconda?
Anaconda is a distribution of packages built for data science. It comes with conda, a package, and environment manager. We usually used conda to create environments for isolating our projects that use different versions of Python and/or different version of packages. We also use it to install, uninstall, and update packages in our project environments. When you download Anaconda first time it comes with conda, Python, and over 150 scientific packages and their dependencies. Anaconda is a fairly large download (~500 MB) because it comes with the most common data science packages in Python, for people who are conservative about disk space, there is also Miniconda, a smaller distribution that includes only conda and Python. You can still install any of the available packages with conda, that comes by default with the standard version. Conda is a program we will be using exclusively from the command line, so if you aren’t comfortable using it, check out these learn by doing videos on Lynda.com command prompt tutorial for Windows and Linux Command Line Basics for Mac OSX/Linux
Anaconda is available for Windows, Mac OS X, and Linux. You can find the installers and installation instructions at https://www.continuum.io/downloads If you already have Python installed on your computer, this won’t break anything. Instead, the default Python used by your scripts and programs will be the one that comes with Anaconda. Choose the Python 3.5 version, you can install Python 2 versions later. Also, choose the 64-bit installer if you have a 64-bit operating system, otherwise go with the 32-bit installer. Go ahead and choose the appropriate version, then install it. Continue on afterward!
After installation, you’re automatically in the default conda environment with all packages installed which you can see below. You can check out your own install by entering conda list into your terminal.
Creating Environments via Conda
conda can be used to create environments to isolate your projects. To create an environment, use
conda create -n env_name list of packages
in your terminal. Here -n env_name sets the name of your environment (-n for the name) and list of packages is the list of packages you want to be installed in the environment. For example, to create an environment named my_env and install numpy in it, type
conda create -n my_env numpy
When creating an environment, you can specify which version of Python to install in the environment. This is useful when you’re working with code in both Python 2.x and Python 3.x. To create an environment with a specific Python version, do something like
conda create -n py3 python=3
conda create -n py2 python=2
To install a specific version, use
conda create -n py python=3.3
for Python 3.3.
Once you have an environment created, use
source activate my_env
to enter it on OSX/Linux. On Windows, use
When you’re in the environment, you’ll see the environment name in the terminal prompt. Something like
(my_env) ~ $
The environment has only a few packages installed by default, plus the ones you installed when creating it. You can check this out with command
Installing packages in the environment is the same as we saw before:
conda install package_name
Only this time, the specific packages you install will only be available when you’re in the environment. To leave the environment, type
Managing Packages via Conda
Once you have Anaconda installed, managing packages are fairly straightforward. To install a package, type
conda install package_name
in your terminal. For example, to install numpy, type
conda install numpy
Saving & Loading Environments
You can install multiple packages at the same time. Something like
conda install numpy scipy pandas
will install all those packages simultaneously. It’s also possible to specify which version of a package you want by adding the version number such as
conda install numpy=1.10
Conda also automatically installs dependencies for you. For example, scipy package depends on numpy, as it uses and requires numpy. So, If you install just scipy
conda install scipy
Conda will also install numpy if it isn’t already installed.
Most of the commands are pretty intuitive. To uninstall, use
conda remove package_name
To update a package
conda update package_name
If you want to update all packages in an environment, which is often useful, use
conda update --all
And finally, to list installed packages, it’s again
If you don’t know the exact name of the package you’re looking for, you can try searching with
conda search search_term
For example, if you want to install a package that read and write excel files, but if you are not sure of the exact package name. you can try searching for excel keyword
conda search excel
It returns a list of the excel writer packages available with the appropriate package name, which I personally recommend, XlsxWriter.
Saving, Listing, Sharing & Removing Environments
Saving and Sharing Environments:
A really useful feature is sharing environments so others can install all the packages used in your code, with the correct versions. You can save the packages to a YAML file with
conda env export > environment.yaml
The first part writes out all the packages in the environment, including the Python version.
Above you can see the name of the environment and all the dependencies (along with versions) are listed. The second part of the export command
writes the exported text to a YAML file environment.yaml. This file can now be shared and others will be able to create the same environment you used for the project.
To create an environment from an environment file use
conda env create -f environment.yaml
This will create a new environment with the same name listed in environment.yaml.
If you forget what your environments are named (happens to me sometimes), use
conda env list
to list out all the environments you’ve created. You should see a list of environments, there will be an asterisk next to the environment you’re currently in. The default environment, the environment used when you aren’t in one, is called root.
If there are environments you don’t use anymore
conda env remove -n env_name
will remove the specified environment (here, named env_name).
First good practice:
While using anaconda is having 2 separate environments one for Python 2 and other for Python 3.
for example, you can use
conda create -n py2env python=2
conda create -n py3env python=3
to create two separate environments, py2env and py3env. Now you can have a general use environment for each Python version. In each of these 2 separate python version environments, you should install most of the standard data science packages numpy, scipy, pandas, matplotlib, etc.
Second best practice:
When sharing your code on GitHub, is to make an environment file and include it in the repository. This will make it easier for people to install all the dependencies for your code. Ideally, you should also include a pip requirements.txt file, for people who are not using conda, by using
That’s all for a quick start. Key is to keep practicing using above mentioned commands.
Hopefully, this article will help you to cut down your time spent during python package management in half, and help you to jumpstart in using anaconda virtual environments and package management for your day to day python project environment.
Please do let me know your thoughts, questions under the comments section. I would really appreciate getting some feedback on this article & ideas to improve it.
In the meanwhile, Happy Thinking…