Pankaj Mathur
Published in

Pankaj Mathur

What is Anaconda and Why should I bother about it?

In this Article we will be installing Anaconda, managing python packages, creating individual conda environments and sharing them via conda YAML file. We will be covering most of these topics in following order:

  • What is Anaconda & Why Should I bother about it?
  • Installing Anaconda
  • Creating Environments via Conda
  • Managing Packages via Conda
  • Saving & Loading Environments
  • Listing, Sharing & Removing Environments
  • Best Practices

First of All, What is Anaconda & Why Should I bother about it?
You probably already have Python installed and will be wondering why you need this at all. Firstly, since Anaconda comes with a bunch of data science packages, you’ll be all set to start working with data. Secondly, using conda to manage your packages and environments will reduce future issues dealing with the various libraries you’ll be using.
In most of the real world Data Science projects, conda based package and environments are widely used and I personally preferred conda based package installation and maintenance of project then installing and maintaining directly PIP based packages.
So, Why Anaconda?
Anaconda is a distribution of packages built for data science. It comes with conda, a package, and environment manager. We usually used conda to create environments for isolating our projects that use different versions of Python and/or different version of packages. We also use it to install, uninstall, and update packages in our project environments. When you download Anaconda first time it comes with conda, Python, and over 150 scientific packages and their dependencies. Anaconda is a fairly large download (~500 MB) because it comes with the most common data science packages in Python, for people who are conservative about disk space, there is also Miniconda, a smaller distribution that includes only conda and Python. You can still install any of the available packages with conda, that comes by default with the standard version. Conda is a program we will be using exclusively from the command line, so if you aren’t comfortable using it, check out these learn by doing videos on Lynda.com command prompt tutorial for Windows and Linux Command Line Basics for Mac OSX/Linux

Installing Anaconda
Anaconda is available for Windows, Mac OS X, and Linux. You can find the installers and installation instructions at https://www.continuum.io/downloads If you already have Python installed on your computer, this won’t break anything. Instead, the default Python used by your scripts and programs will be the one that comes with Anaconda. Choose the Python 3.5 version, you can install Python 2 versions later. Also, choose the 64-bit installer if you have a 64-bit operating system, otherwise go with the 32-bit installer. Go ahead and choose the appropriate version, then install it. Continue on afterward!
After installation, you’re automatically in the default conda environment with all packages installed which you can see below. You can check out your own install by entering conda list into your terminal.
Creating Environments via Conda
conda can be used to create environments to isolate your projects. To create an environment, use

conda create -n env_name list of packages

in your terminal. Here -n env_name sets the name of your environment (-n for the name) and list of packages is the list of packages you want to be installed in the environment. For example, to create an environment named my_env and install numpy in it, type

conda create -n my_env numpy

When creating an environment, you can specify which version of Python to install in the environment. This is useful when you’re working with code in both Python 2.x and Python 3.x. To create an environment with a specific Python version, do something like

conda create -n py3 python=3

or

conda create -n py2 python=2

To install a specific version, use

conda create -n py python=3.3

for Python 3.3.
Once you have an environment created, use

source activate my_env

to enter it on OSX/Linux. On Windows, use

activate my_env

When you’re in the environment, you’ll see the environment name in the terminal prompt. Something like

(my_env) ~ $

The environment has only a few packages installed by default, plus the ones you installed when creating it. You can check this out with command

conda list

Installing packages in the environment is the same as we saw before:

conda install package_name

Only this time, the specific packages you install will only be available when you’re in the environment. To leave the environment, type
On OSX/Linux:

source deactivate

On Windows:

deactivate

Managing Packages via Conda
Once you have Anaconda installed, managing packages are fairly straightforward. To install a package, type

conda install package_name

in your terminal. For example, to install numpy, type

conda install numpy

Saving & Loading Environments
You can install multiple packages at the same time. Something like

conda install numpy scipy pandas

will install all those packages simultaneously. It’s also possible to specify which version of a package you want by adding the version number such as

conda install numpy=1.10

Conda also automatically installs dependencies for you. For example, scipy package depends on numpy, as it uses and requires numpy. So, If you install just scipy

conda install scipy

Conda will also install numpy if it isn’t already installed.
Most of the commands are pretty intuitive. To uninstall, use

conda remove package_name

To update a package

conda update package_name

If you want to update all packages in an environment, which is often useful, use

conda update --all

And finally, to list installed packages, it’s again

conda list

If you don’t know the exact name of the package you’re looking for, you can try searching with

conda search search_term

For example, if you want to install a package that read and write excel files, but if you are not sure of the exact package name. you can try searching for excel keyword

conda search excel

It returns a list of the excel writer packages available with the appropriate package name, which I personally recommend, XlsxWriter.

Saving, Listing, Sharing & Removing Environments

Saving and Sharing Environments:

A really useful feature is sharing environments so others can install all the packages used in your code, with the correct versions. You can save the packages to a YAML file with

conda env export > environment.yaml

The first part writes out all the packages in the environment, including the Python version.
Above you can see the name of the environment and all the dependencies (along with versions) are listed. The second part of the export command

> environment.yaml

writes the exported text to a YAML file environment.yaml. This file can now be shared and others will be able to create the same environment you used for the project.
To create an environment from an environment file use

conda env create -f environment.yaml

This will create a new environment with the same name listed in environment.yaml.

Listing Environments:

If you forget what your environments are named (happens to me sometimes), use

conda env list

to list out all the environments you’ve created. You should see a list of environments, there will be an asterisk next to the environment you’re currently in. The default environment, the environment used when you aren’t in one, is called root.
Removing Environments:
If there are environments you don’t use anymore

conda env remove -n env_name

will remove the specified environment (here, named env_name).
Best Practices

First good practice:
While using anaconda is having 2 separate environments one for Python 2 and other for Python 3.
for example, you can use

conda create -n py2env python=2

and

conda create -n py3env python=3

to create two separate environments, py2env and py3env. Now you can have a general use environment for each Python version. In each of these 2 separate python version environments, you should install most of the standard data science packages numpy, scipy, pandas, matplotlib, etc.

Second best practice:
When sharing your code on GitHub, is to make an environment file and include it in the repository. This will make it easier for people to install all the dependencies for your code. Ideally, you should also include a pip requirements.txt file, for people who are not using conda, by using

pip freeze

That’s all for a quick start. Key is to keep practicing using above mentioned commands.

Hopefully, this article will help you to cut down your time spent during python package management in half, and help you to jumpstart in using anaconda virtual environments and package management for your day to day python project environment.

Please do let me know your thoughts, questions under the comments section. I would really appreciate getting some feedback on this article & ideas to improve it.

In the meanwhile, Happy Thinking…

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store