Anaconda: Tool for Data Science

Let’s begin your Data science Journey.

sanchit sharma
TEK Society
5 min readOct 10, 2020

--

Data science is the pervasive field which wants access to various tools packages, and configuring those tools is a typical task and little tough to manage all at once.

Here comes Anaconda, produced by anaconda, Inc. is designed for python and R language basically for data scientists. Anaconda is a program to manage (install, upgrade, or uninstall) packages and environments to use usually with Python. It’s simple to install packages with Anaconda and create virtual environments to work on multiple projects conveniently.

Is it going to be difficult to use??

  • No, It’s easy to use and is used because it is very effective to work with python packages.

Why choose anaconda?

Anaconda comes with a bunch of tools used in data science and machine learning which can be installed with just a few clicks, you’ll be all set to start working with data. And using conda to manage your packages and environments will reduce future issues dealing with the various libraries you’ll be using. It also helps to isolate the projects with different version dependencies. Anaconda is a heavy software (around 500MB) as it came up with various tools. It bundles many of the common libraries used in commercial and scientific python work like Numpy, sklearn etc.

It gives a head-start the data science journey with all the configuration needed by a beginner data science learner.

Features

  • Anaconda Navigator — It is a graphical user interface that helps open up any installed applications, such as Jupyter Notebook or VS code editor. See a snapshot of Anaconda Navigator below:
  • Conda: A command-line utility for package and environment management. Mac/Linux users can use the Terminal, and Windows users can use the “Anaconda Prompt “to execute conda commands. Windows users must run the Anaconda Prompt as an Administrator.

You can check your current conda version by command given below

  • Python: The latest version of Python gets installed as an individual package.
  • Anaconda Prompt: [Only for Windows] a terminal where you can use the command-line interface to manage your environments and packages.
  • A bunch of applications, such as Spyder. It is an IDE geared toward scientific development. In total, over 160 scientific packages and their dependencies are also installed.

If you don’t need all the packages or need to conserve bandwidth or storage space, there is an option for you — Miniconda.

Miniconda is a smaller distribution as compared to Anaconda, which includes only conda and Python. Miniconda can do everything Anaconda is capable of, but doesn't have the preinstalled packages. Interestingly, you can anytime upgrade from Miniconda to Anaconda by using the command:

Download the installer from https://www.anaconda.com/download/. Choose the Python 3.7 or higher version, and the appropriate 64/32-bit installer.

After installation, you’re automatically in the default conda environment with all packages installed which you can check by

Commands :

  • It’s best to update all the packages in the default environment.
  • To Install package

1. You can also install multiple packages in one command.

2. It’s also possible to specify which version of a package you want by adding the version number such as(helpful for projects with different version dependencies).

However some versions of libraries stop being developed like python2.x but there are also projects still working on old versions. This feature gives anaconda an edge over other environment managers.

  • Remove package
  • Update package
  • Search package
Photo by NASA on Unsplash

Conda can be used to create isolated environments for your projects. To create an environment, use the following command in your Anaconda Prompt.

To check/list the environments existance,

To create with specific version libraries

conda create -n env_name [python=X.X] [LIST_OF_PACKAGES]

  • To Activate/Deactivate virtual environment

Saving and loading environments

A very beneficial feature is environment sharing. When sharing your code on GitHub it automatically includes environment file in your repo. so others can install all the packages/environment used in your code with ease, with the correct versions.

you can see the name of the environment, and all the dependencies (along with versions) are listed with the above command.

To share the environment, you need to create a YAML file like below:

Now you can share this file with people who want the same environment as you.

  • To create an environment from an environment file, use the following command:

Remove the Environment

If there are environments you don’t use anymore, use the command below to remove the specified environment. (!! To remove the environment first deactivate it.!!)

Share the List of Dependencies

For users not using conda, you may want to share the list of packages installed in the current environment. You can use pip to generate such a list as requirements.txt file using:

  • you can install all the packages mentioned in the requirements.txt file using:

Fact: Installing pandas by itself will also install numpy since numpy is a dependency of pandas. Conda makes sure to also install any packages that are required by the package you’re installing.

For more such posts, do follow out our Publication:
https://medium.com/tek-society

Also do clap! It encourages me to write better! And follow me for its next part.
Thank you!

--

--

TEK Society
TEK Society

Published in TEK Society

We are a group of passionate learners who come together to learn and to teach, which leads in bringing about a collaborative result together which influences lives around us. We strive to enhance technical knowledge among students and we do so by providing a platform.

sanchit sharma
sanchit sharma