Getting started with Python (for Data Science & Machine Learning)

Prologue: If you have a basic understanding of programming and are eager to get into the domain of Artificial Intelligence, Machine Learning and Data Science, among many other, but is baffled by where to start, then this article is absolutely for you.

Why use Python?

“ Python is a widely used high-level, general-purpose, interpreted, dynamic programming language. Its design philosophy emphasizes code readability, and its syntax allows programmers to express concepts in fewer lines of code than would be possible in languages such as C++ or Java. The language provides constructs intended to enable clear programs on both a small and large scale.”

The following guide is divided into 7 steps. 7 comprehensive steps to get you up and running Python scripts in your machine and tackling real world-problems!

Step 1: Install Anaconda

But what is Anaconda? I am kinda afraid of snakes just so that you know…

“With over 6 million users, the open source Anaconda Distribution is the easiest way to do Python data science and machine learning. It includes hundreds of popular data science packages and the conda package and virtual environment manager for Windows, Linux, and MacOS. Conda makes it quick and easy to install, run, and upgrade complex data science and machine learning environments like Scikit-learn, TensorFlow, and SciPy. Anaconda Distribution is the foundation of millions of data science projects as well as Amazon Web Services’ Machine Learning AMIs and Anaconda for Microsoft on Azure and Windows.”

You can download Anaconda from here. (Download the Python 3.6 version)

Step 2: Setup PATH environment variable

What is this shit? Why do we need it?

“Environment variables are set to allow access to command line tools and to enable other tools to interact with SDKs more easily. PATH specifies the directories in which executable programs are located on the machine that can be started without knowing and typing the whole path to the file on the command line.”

Fair enough. How do I set it up?

  1. Right-Click on ‘My Computer’
  2. Click on Properties
  3. Click on Advanced system settings
  4. Click on Environment variables
  5. Click on New
  6. Set Variable name to Path
  7. Set Variable value to the directory of the Scripts folder inside Anaconda

Here’s a picture guide and what my setup looks like :

This is what you should see if you click on Environment variables
Set your own appropriate directory to Scripts

You can check whether if Python has properly installed in your machine by heading over to Command Line and typing python. If you get something like this, you are good to go. Also shows the version of Python running in your machine.

Step 3: Setting up our Text Editor

My text editor of choice is Sublime Text 3. You can download it here. I would highly recommend watching this video which will help you setup and beautify your ST3.

Note: You will need to install SublimeREPL package to run your Python code because the default ST3 console sometimes fail unexpectedly.

Step 4: Installing Dependencies

Welcome to the world of packages/libraries! Simply put, and to quote Siraj, “Dependencies are packages that our code depends on.” There are tons and tons of packages out there that will help you write your Python script. Each library serves a specific purpose.

There is only one rule however, you need to install them before using them.

There are quite a few ways to install packages. I prefer pip install.

But what is pip install?

pip is a package management system used to install and manage software packages written in Python
Cool Fact: pip is a recursive acronym that can stand for either “Pip Installs Packages” or “Pip Installs Python”.

Okay cool. But how do I pip install a library?

  1. Head over to your Scripts folder inside your Anaconda directory.
  2. Write cmd on top and press Enter. This will open the command line in that directory.
  3. You can read the documentation/github/stack overflow for a library to understand what command to write to install it. 
    Most of the time it is usually “pip install package_name
  4. Once the package has installed, you can import it in ST3.

Here’s a picture guide:

Head over to your Scripts folder
Type cmd in the directory bar
This should show the appropriate directory for your Scripts
This is an example of how you would install numpy package

Some libraries require an additional step before the pip install. You need to download their respective wheel(.whl) file from here, put the file in your Scripts folder, head over to the cmd and pip install the file. (Hack: just type a few letters of the file name and hit ‘Tab’ to auto-complete the file name)

An example of pip installing a .whl file

Step 5: Learning to read Documentations

You will need to google a lot. More than half of the time, developers are just trying to find solutions to their problems in Stack Overflow, reading documentations of a package or its Github readme to know the details of its different modules and how it can be implemented.

This is what the documentation for TensorFlow looks like

Step 6: Write your Python script

Figure out a problem that you would like to solve. Google to find if there are any available Python libraries for this. If not directly, find out what combination of libraries can be used to achieve your goal. Install them in your machine. Read its documentation and even sample code if available to understand its usage.

Step 7: Follow really smart people

It is very essential to stay updated with the collective knowledge of the developer’s community world-wide.

I follow these people on Twitter to get my daily dose of inspiration:

You can follow Siraj on YouTube. He has some amazing playlists on getting started with AI. Another of my favorite is a playlist on ML by Josh Gordon of Google.

“The woods are lovely, dark and deep. But I have promises to keep, and miles to go before I sleep.” — Robert Frost

Keep Calm and Keep Coding!