Intro to Pandas: -1 : An absolute beginners guide to Machine Learning and Data science.

Rakshith Vasudev
HackerNoon.com
Published in
4 min readOct 16, 2017

--

Pandas is hands down one of the best libraries of python. It supports reading and writing excel spreadsheets, CVS's and a whole lot of manipulation. It is more like a mandatory library you need to know if you’re dealing with datasets from excel files and CSV files. i.e for Machine learning and data science.

This is part one of Pandas tutorial. I’m not going to cover everything possible with pandas, however, I want to give you a taste of what it is and how you can get started with it. This tutorial is going to be super short just introducing you to Series object of pandas.

As other libraries, you’d import pandas and reference it as pd.

import pandas as pd

We’re officially indicating to python that pandas must be hence fourth referred to as pd.

If you like trance music, I’m positive you’ve heard of songs mentioned in this list.

# lets create a list of songs.
songs = ['In the name of love','Scream','Till the sky falls down','In and out of Love']
# lets also create a list of corresponding artists. FYI: 'MG' stands # for Martin Garrix, 'TI' for Tiesto, 'DB' for Dash Berlin, 'AV'for # Armin Van Buuren.
artists = ['MG','TI','DB','AV']
# likewise lets create a dictionary that contains artists and songs.
song_arts = {'MG':'In the name of love','TI':'Scream','DB':'Till the sky falls down','AV':'In and out of Love'}

How do I create a table like structure using these lists? pd.Series()

pd.Series() is a method that creates a series object from data passed. The data must be defined as a parameter.

# create a Series object whose data is coming from songs list.
ser_num = pd.Series(data=songs)
ser_num
====================================================================
0 In the name of love
1 Scream
2 Till the sky falls down
3 In and out of Love
dtype: object

So, what is a “Series” object in Pandas?

It is a data structure defined by Pandas. Basically it looks like a table having rows and columns.

0        In the name of love
1 Scream
2 Till the sky falls down
3 In and out of Love

Notice that these numbers on the first column were added automatically by pandas. They serve as index.

The first column here are the indices of the series and the second column are values of the series.

Say supposing you want to access ‘In and out of Love’. How would you do that?

# get the element that corresponds to index 3.
ser_num[3]
====================================================================
'In and out of Love'

What if you want the artists name to be the index of the song?

# make artists the index this time.
ser_art = pd.Series(data=songs,index=artists)
ser_art
====================================================================
MG In the name of love
TI Scream
DB Till the sky falls down
AV In and out of Love
dtype: object

This time instead of numbers, name of artists are made as the index. But how?Notice, this time we passed artists as index parameter additionally to pd.Series().

How to access via custom index defined? i.e How to get songs by their artist name?

Just pass the name of the artist and you get their song.

ser_art['MG']
====================================================================
'In the name of love'
ser_art['AV']
====================================================================
'In and out of Love'
ser_art['DB']
====================================================================
'Till the sky falls down'

It is kind of like accessing elements via dictionary. There you pass the ‘key’, here in series you pass ‘index’ to retrieve elements.

Not to mention even numbers still work as index.

ser_art[0]
====================================================================
'In the name of love'
ser_art[2]
====================================================================
'Till the sky falls down'

Great! Seems interesting. But how to create a series object from dictionary?

It’s as simple as passing the dictionary element to pd.Series(), like so:

ser_dict= pd.Series(song_arts)
ser_dict
====================================================================
AV In and out of Love
DB Till the sky falls down
MG In the name of love
TI Scream
dtype: object

pandas elegantly created series object by taking keys as series’s indices and values as series’s values.

Accessing still works fine like before.

ser_art['TI']
====================================================================
'Scream'
ser_art['DB']
====================================================================
'Till the sky falls down'

What to do if I want to get all the indices and values separately from a Series object?

Series object has index and values attribute that can pump out only indices and values of a particular series.

# get the indices only
ser_art.index
====================================================================
Index(['MG', 'TI', 'DB', 'AV'], dtype='object')
# get only values of the series
ser_art.values
====================================================================
array(['In the name of love', 'Scream', 'Till the sky falls down',
'In and out of Love'], dtype=object)

This is just the tip of iceberg of what can be done with series. We’ll cover more on pandas in the next upcoming tutorial.

Here’s a video tutorial explaining everything that I did if you’re interested to consume via video.

Stay tuned. There’s going to be a follow up tutorial involving more content on pandas.

If you want to learn numpy, I wrote an article titled “Introduction to Numpy -1 : An absolute beginners guide to Machine Learning and Data science.” Check it out.

If you liked this article, a clap/recommendation would be really appreciated. It helps me to write more such articles.

--

--