How to create data science work-space in Linux with Python for beginner?
People are hooked to data science these days. But for that you need to learn the theory and make your hands dirty with coding. You will get plenty of resources over internet for learning the theory. But for a beginner, you also need to create a data science work-space where you can code and easily test the result while doing the proof of concept (POC). In this tutorial, we will learn how to set up your machine for that, but of course in Linux. All the syntax in this tutorial are applicable for Ubuntu.
Python Package Installer:
You can easily install any python package in the linux machine with pip
or conda
. These are the python package installer in linux. For pip
installation see this link. You can install conda
either with Anaconda
or Miniconda
. My personal choice is Miniconda
as it installs only the packages which you ask for. But Anaconda
by default installs lots of packages which you may not require and make the system heavy, which I personally don’t like. It’ s always better to keep your system as light as possible. For Miniconda
(64 bit) installation do the following:
wget -c http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh
chmod +x Miniconda-latest-Linux-x86_64.sh
./Miniconda-latest-Linux-x86_64.sh
And follow the steps on the terminal. If you want to install Anaconda
then check this link .
Python Package Installation in Environment
Once we have the package installer, we can install packages very easily with pip
or conda
. For example to install numpy simply run pip install numpy
or conda install numpy
in the terminal. But a better approach will be to create a python environment and install all your packages in those environment. A python environment is like a container where all the related packages will be stored together. It helps to manage package versions and avoids any conflicts. For example for one of your data science project you need to use python 3.5
and keras
with theano
back end, where as for a different data science task you may need python 2.7
with tensorflow
(keras, theano, tensorflow, etc are popular deep learning packages), it’s better to create different environments and install the required packages separately. Because depending upon different python versions, the required packages and the syntax for using those packages in code may vary. So you can use virtualenv
(installation guide link1, link2). You can also create your environment using conda
. Follow the steps from this link to setup your conda environment.
Jupyter Notebook (Optional):
In simple terms, it’s a browser based interactive python editor which helps to execute the python code snippet and display the output on the browser. You can write your own comment as markdown (what is markdown? link1, link2). Here is a snapshot of the Jupyter Notebook (Image source).
For beginner, sometime it’s good to do all your POC using Jypyter Notebook and once you are satisfied then you can write the entire python scripts for running on bigger datasets on the server with high ram and gpu support. You can install Jupyter Notebook
using conda
.
conda install jupyter
Once the installation is done type jupyter notebook
in the terminal and hit enter. It will open a browser and the notebook will start. However if you need to install any python packages while writing the scripts in notebook, simply install them in the linux terminal and it will automatically get reflected in the notebook.
Happy coding :)
In the next post we will cover how to set up Jupyter Notebook in server and access it through the browser in local machine in 1-Hop and 2-Hop scenario. Here is the link .