Pandas: Python Data Analysis Library
Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
Install Pandas
pip install pandas
Import pandas
import pandas as pd
Creating a dataframe
Reading a .csv file
Here, we will use a ‘hotel_booking’ dataset to understand the various functions of pandas.
data = pd.read_csv('hotel_bookings.csv')
data.head()
Only a few columns are shown above out of 32 columns given in the dataset.
Reading first 5 rows
data.head()
Reading last 5 rows
data.tail()
Shape of a dataframe
data.shape
Show informatuon about the dataset
data.info()
Get column names
data.columns
Check number of NaN values
This gives the number of NULL values present in the dataset.
data.isnull().sum()
Display statistical information
It gives mean, max, min, standard deviation, count, 25%, etc of the distribution.
data.describe()
Indexing
.iloc[start_row: end_row, start_col : end_col] will display rows in the range
[start_row, end_row) and columns in the range [start_col, end_col)
Drop columns
Handling missing values
Fill missing numerical values with mean
Missing categorical values are filled with mode value
Changing the datatype of a column using ‘astype()’
Another way:
To learn more, check out the official documentation of Pandas here.
Happy Learning!