Pandas: Python Data Analysis Library

Khushijain
Nerd For Tech
Published in
3 min readJun 10, 2021

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

Install Pandas

pip install pandas

Import pandas

import pandas as pd

Creating a dataframe

Reading a .csv file

Here, we will use a ‘hotel_booking’ dataset to understand the various functions of pandas.

data = pd.read_csv('hotel_bookings.csv')
data.head()

Only a few columns are shown above out of 32 columns given in the dataset.

Reading first 5 rows

data.head()

Reading last 5 rows

data.tail()

Shape of a dataframe

data.shape

Show informatuon about the dataset

data.info()

Get column names

data.columns

Check number of NaN values

This gives the number of NULL values present in the dataset.

data.isnull().sum()

Display statistical information

It gives mean, max, min, standard deviation, count, 25%, etc of the distribution.

data.describe()

Indexing

.iloc[start_row: end_row, start_col : end_col] will display rows in the range

[start_row, end_row) and columns in the range [start_col, end_col)

Drop columns

Handling missing values

Fill missing numerical values with mean

Missing categorical values are filled with mode value

Changing the datatype of a column using ‘astype()’

Another way:

To learn more, check out the official documentation of Pandas here.

Happy Learning!

--

--