Python Virtual Environment for Data Science

hiddenntreasure
Big0one
Published in
8 min readJun 19, 2020

This article will describe everything about how to setting up a python virtual environment for data science. At the end of the article you will learn about virtual environment, Anaconda, conda, pip, package manager, packages installation, jupyter notebook etc. And how to cope with different project with different version of packages without any conflict.

1. What is Virtual Environment for python?

Virtual environment is a tool that helps to install of the required dependencies for different project. Its like a container or isolated environment for specific project. It helps to resolve dependencies issue , version of python package etc.

2. Why it is needed?

Suppose you are working on two different projects. And different project expect different dependencies and version of python packages. So, you will need to uninstall a version of package and reinstall required version of packages for your current projects. But its a bad practice. Cause when you work on the other project you may need previous version of python packages. So,you need to go through the process again. Besides, you can’t work on two or more projects at a time.

For example : some function is depreciated in the latest version of python (v_3.8.3). But you need to use that function. So, you need to uninstall latest version of python and reinstall previous version of python which allow your required function. But you can avoid the uninstall and reinstall part by creating different python virtual environment.

3. Different ways to create Virtual Environment ?

Python has three popular ways of creating virtual environment. They are virtualenv, pipenv and Anaconda. But we will learn about anaconda today.

4. What is Anaconda?

Anaconda is free and open-source distribution of python. It comes with 250 packages installed and can installed over 7500 packages using both pip and conda. You may asked what is conda? Then i think you know that pip is a python package manager. Where conda is a both package manager and environment manager. It is language agnostic that means conda support other languages like R,ruby etc. And conda package and environment manager is included in all versions of Anaconda.

5. Why Anaconda?

If you are engineer or scientist or enthusiastic in data science ,then you will need lot of packages like numpy, scipy, opencv, tensorflow, keras etc. You will find that it is hard and need lot of work to install these packages in python default approach. I personally experiences frustration to install opencv package in ubuntu os. But using conda i can easily install any version of opencv in my specific virtual environment or even in base environment. pip install packages from PyPI where conda can install packages from:

  • Install packages (written in any language) from repositories like Anaconda Repository and Anaconda Cloud. Anaconda Cloud provides third party repositories like Conda-Forge. If anacond default repository can’t provide a packages, then you can search in Anaconda Cloud for third party repository.
  • Install packages from PyPI by using pip in an active Conda environment.

6. How to install Anaconda?

First go to the below link:

You will find different edition like individual, Team, Enterprise and Professional etc. I choose individual edition.

Anaconda for Windows, MacOS and Linux

Windows : Install your required anaconda.exe then install.Its a graphical user interface. So,it easy to install just keep click on ‘next’.

Linux : you could download from above link or from terminal. In terminal change the directory where you want to download. Then use curl/wget followed by anaconda repository link to download in the specified directory.

$ cd /tmp
$ curl -O https://repo.anaconda.com/archive/Anaconda3-2019.03-Linux-x86_64.sh

Ensure the integrity of the installer with cryptographic hash verification through SHA-256 checksum:

$ sha256sum Anaconda3–2019.03-Linux-x86_64.sh

This will give an output:

Output45c851b7497cc14d5ca060064394569f724b67d9b5f98a926ed49b834a6bb73a  Anaconda3-2019.03-Linux-x86_64.sh

Now replace the downloaded file to your Downloads directory, replace ~/Downloads/ with the path to the file you downloaded.

$ cd ~/Downloads$ bash ~/Downloads/Anaconda3-2020.02-Linux-x86_64.sh

Note : Include the bash command regardless of whether or not you are using Bash shell.

The installer prompts “In order to continue the installation process, please review the license agreement.” Click Enter to view license terms.Scroll to the bottom of the license terms and enter “Yes” to agree.

The installer prompts you to click Enter to accept the default install location, CTRL-C to cancel the installation, or specify an alternate installation directory. I

f you accept the default install location, the installer displays “PREFIX=/home/<user>/anaconda<2 or 3>” and continues the installation. It may take a few minutes to complete.

Note : We recommend you accept the default install location. Do not choose the path as /usr for the Anaconda/Miniconda installation.

The installer prompts ask “Do you wish the installer to initialize Anaconda3 by running conda init?” . We recommend “yes”.

Close and open your terminal window for the installation to take effect.

Anaconda installation is complete. Let me remind conda is included in anaconda.

7. How to create virtual environment using conda?

Create an environment with conda for python development. Go to terminal:

$ conda create --name env_name python

This will output the location of enviornment

The following packages will be installed when installing a environment and ask for your permission to continue the creation :

input ‘y’ to proceed

The environment use the same version of Python as your current shell’s Python interpreter. To create a environment with different version of python use:

$ conda create -n env_name python=3.7

It will take some time to create your environment. After completing the process.

$ conda activate env_name

To see the packages installed when creating an environment :

$ conda list
pakages

To get out of the environment :

$ conda deactivate

7. How to install required packages for Data Science?

First activate your created environment for your projects:

$ conda activate env_name

Suppose we want to install pandas with/without specified version:

# with specified version
$ conda install pandas=0.24.1
# wtihout
$ conda install pandas

without specified version will install latest version of the package.

You can update a package using:

$ conda update pandas

Note : This installation or update is happen from default repository of anaconda. You can install these from third party channel which included in Anaconda cloud.

Suppose we want to install opencv. I already mentioned that i failed several time to install opencv in virtualenv. Most of the time opencv file is broken or previous version isn’t available. And obviously lot of step to carry on.

But in case of conda its quite easy. You can install it using conda from default anaconda repository :

$ conda install opencv

If opencv isn’t available from the default Anaconda Repository, you can try searching for it on Anaconda Cloud, which hosts Conda packages provided by third party repositories like Conda-Forge.

To install from third party you need to specify the third party channel name like:

$ conda install -c channel_name packages_name

to install opencv from Third Party Channel:

$ conda install -c conda-forge opencv

After completing installation. lets see the list of packages :

$ conda list

Opencv take lot of time. So, i installed pandas from conda-forge channel using

$ conda install -c conda-forge opencv

After completion

$ conda list
pandas from conda-forge channel

You can see that pandas from conda-forge channel. Those whose channel is missing actually they are installed from default Anaconda repository.

You could also install packages using pip. Where channel will be ‘PyPI’.

Required packages for data science and their installation process:

$ conda install -c anaconda keras
$ conda install jupyter
$ conda install matplotlib
$ conda install pandas
$ conda install numpy
$ conda install -c conda-forge opencv

8. How to add created virtual environment in jupyter notebook ?

Jupyter Notebook confirms that the IPython kernel is available, but you have to manually add a kernel with a different version of Python or a virtual environment.

First, you need to activate your virtual environment. Next, install ipykernel which provides the IPython kernel for Jupyter:

$ pip install --user ipykernel

Next add virtual environment to jupyter notebook:

$ python -m ipykernel install --user --name=env_name

This output

Installed kernelspec env_name in /home/dm20/.local/share/jupyter/kernels/env_name

Add is complete. If you open jupyter notebook, then you can use that environment for particular project python file.

9. How do work with a particular virtual environment in jupyter notebook ?

When installing packages for particular environment you need to activate that environment. If all required packages are installed, you are ready to work with.

But no need to open jupyter notebook after acitvate a virtual environment. You can access any virtual environment from jupyter notebook. And use the particular version of packages from that virtual environment.

$ jupyter notebook

Then go to kernel -> change kernel:

I have four virtual environment

You can have any number of virtual environment. Select the kernel/ virtual environment which is created for your current project.

Note : Interesting thing is you examine different virtual environment for a project .py file.

10. How to share your virtual environment details with your teammate ?

You may need to run the project in another person or teammate laptop. So,you wanna share your virtual environment details with your teammate. So that he could create his own virtual environment for the project in his laptop. There’s easy way to get out. You will perform the below operation and share the file with your teammate:

$ conda env export --file environment.yml

Note : it should be inside a active environment that you wanna share. And it creates environment.yml file in the current directory.

Now ,your teammate can create an exact copy of your environment using the file :

$ conda env create -n env_name -f /path/to/environment.yml

Or your teammate needs to active an existing virtual environment and use to update according to your virtual environment’s :

$ conda env update -n existing_env -f /path/to/environment.yml

Now use :

$ conda list 

to see the installed packages and their channel.

Conclusion

Now you ready to enjoy data science packages. You can install any packages using conda or pip. Keep install packages whenever needed.

I tried to explain everything of how to create your laptop/pc prepare for data science and other engineering works. If anything missing please let me know. This blog written in simple way so that inexperience people cope with the python environment. Thank you.

Reference:

  1. https://www.anaconda.com/products/individual
  2. https://docs.anaconda.com/anaconda/install/linux/
  3. https://janakiev.com/blog/jupyter-virtual-envs/
  4. http://xperimentallearning.blogspot.com/2019/08/python-install-keras-on-anaconda-in.html
  5. https://en.wikipedia.org/wiki/Anaconda_(Python_distribution)
  6. https://realpython.com/python-virtual-environments-a-primer/
  7. https://medium.com/@krishnaregmi/pipenv-vs-virtualenv-vs-conda-environment-3dde3f6869ed

--

--