Pandas - A Powerful python Data Analysis Toolkit
A directory that contains a group of modules and sub-packages is a package where Pandas is said to be Package supported by Python programming language .
PANDAS >>PAN[Panel] + DAS[data].
Pandas is an open-source data analytics tool that are easy to use and follows the below five process
- Analysis of data
- Preparation of data
- Data Manipulation
- Data Modeling
- Data Analysis
This package can be easily installed using the following command
pip install pandas
and version can be checked using
import pandas as pd
print(pd.__version__)
The most frequent keyword that one would come across in pandas is Series and Dataframe
- SERIES — A column in pandas with a one-dimensional array holding a datatype
import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)
- DATAFRAME — Multi-dimensional dataset of pandas
import pandas as pd
a = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
myvar = pd.DataFrame(data)
print(myvar)
Viewing the data demands the usage of .head() , .tail() , .info() etc where they helps in viewing the dataset.
Data cleaning involves the removing of outlier that deviates from the mean value. Missing values are overcome using mean, median ,mode etc can be fixed using dropna().
Correlations can be brought using corr() keyword.
Matplotlib is a python package that can be closely used with pandas for visualizing the data.
import pandas as pdimport matplotlib.pyplot as pltdf = pd.DataFrame({'Name': ['John', 'Sammy', 'Joe'],'Age': [45, 38, 90]}}df.plot(x="Name", y="Age", kind="bar")
ADVANTAGES OF USING PANDAS
1) Pivot dataset.
2) Reshape datasets.
3) Label-oriented slicing.
4) Data Indexing and subsetting higher volume dataset.
5) Merging high-performance datasets in an efficient manner
6) Time series-functionality
The closely assosiated libraries of pandas are NumPy, Matplotlib etc. And hence due to its usage and importance it is used in vast spectra in data science .