Introduction to Pandas

R RAMYA
5 min readApr 13, 2022

--

In this Blog, I will be writing about all the basic stuff you need to know about Pandas such as what is Pandas, why we use Pandas, Applications of Pandas, getting started with Pandas etc.

What Is Pandas In Python?

Pandas is an open source Python package that is most widely used for data science/data analysis and machine learning tasks. It is built on top of another package named Numpy, which provides support for multi-dimensional arrays.

Why we use Pandas?

Pandas has so many uses that it might make sense to list the things it can’t do instead of what it can do.

Applications of Pandas

https://data-flair.training/blogs/applications-of-pandas/

Getting started with Pandas

Install Pandas

pip install pandas

Pandas generally provide two data structures for manipulating data, They are:

  • DataFrame
  • Series

DATAFRAME:

A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns.

What Can You Do With Data Frames Using Pandas?

Pandas makes it simple to do many of the time consuming, repetitive tasks associated with working with data, including:

  • Data cleansing
  • Data fill
  • Data normalization
  • Merges and joins
  • Data visualization
  • Statistical analysis
  • Data inspection
  • Loading and saving data
  • And much more

How To Slice A Data Frame In Pandas

image

A Data Frame in Pandas is a 2-dimensional, labeled data structure which is similar to a SQL Table or a spreadsheet with columns and rows. Each column of a DataFrame can contain different data types.

Slicing a Data Frame in Pandas includes the following steps:

Ensure Python is installed

  1. Import a dataset
  2. Create a Data Frame
  3. Slice the Data Frame

#1 Checking the Version of Pandas

To see if Python and Pandas are installed correctly, open a Python interpreter and type the following:

>> import pandas as pd

>> pd.__version__

#2 Creating a Data Frame

Series are one dimensional labeled Pandas arrays that can contain any kind of data, even NaNs (Not A Number), which are used to specify missing data.

Example:

classes = pd.Series(["Mathematics","Chemistry","Physics","History","Geography","German"])

grades = pd.Series([90,54,77,22,25])


pd.DataFrame({"Classes": classes, "Grades": grades})

Output:

#3 Slicing a Data Frame

Pandas provides DataFrame Slicing using “loc” and “iloc” functions

Example:

Grades =

Report_Card.loc[(Report_Card["Name"] == "Benjamin Duran"),

["Lectures","Grades","Credits","Retake"]]

Output:

How To Access A Row In A DataFrame

You can use the loc and iloc functions to access rows in a Pandas DataFrame.

Example:

using the iloc function:

Benjamin_Math = Report_Card.iloc[0]
Out

Output:

we can also use the loc function:

Benjamin_Math =

Report_Card.loc[(Report_Card["Name"] =="Benjamin Duran") &

(Report_Card["Lectures"] == "Mathematics")]

Output:

To access Benjamin’s Mathematics grade and store it in a variable:

grade = Benjamin_Math["Grades"][0]

How To Group Data In Python Using Pandas

We can use Pandas’ groupby function to group the data.

Example:

Report_Card.groupby(["Lectures","Name"]).first()

grouped_obj = Report_Card.groupby(["Class"])
for key, item in grouped_obj:
print("Key is: " + str(key))
print(str(item), "\n\n")

MEAN:

We can also use the function mean on the Grades column to calculate the average grade for each of the classes.

Report_Card.groupby(["Class"])["Grades"].mean()

Output:

Report_Card.groupby(["Lectures"])["Grades"].mean()

Output:

Note: For more information, refer to Creating a Pandas DataFrame

SERIES:

Pandas series is a one-dimensional labelled array capable of holding data of any type (integer, string, float, python objects, etc.).

Pandas Series is nothing but a column in an excel sheet.

Creating a Series

Pandas Series can be created from the lists, dictionary, and from a scalar value etc.

In the real world, a Pandas Series will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, an Excel file.

import pandas as pd

import numpy as np

# Creating empty series

ser = pd.Series()

print(ser)

# simple array

data = np.array(['g', 'e', 'e', 'k', 's'])

ser = pd.Series(data)

print(ser)

Output:

Series([], dtype: float64)
0 g
1 e
2 e
3 k
4 s
dtype: object

Note: For more information, refer to Creating a Pandas Series

Thanks!

See you in next blog…

--

--