Pandas in Python
Pandas is used for data manipulation, analysis and cleaning.
What are Data Frames and Series?
Dataframe is a two dimensional, size mutable, potentially heterogeneous tabular data.
It contains rows and columns, arithmetic operations can be applied on both rows and columns. Dataframe contains multiple columns and multiple data type.
Series is a one dimensional label array capable of holding data of any type. It can be integer, float, string, python objects etc. Panda series is nothing but a column in an excel sheet.It generally contains one column and one data type.
There are 3 parts in Panda
- Indexing
- Columns
- Records/Row
Priority of the data type is int, float, string(object)
How to create dataframe and series?
s = pd.Series([1,2,3,4,56,np.nan,7,8,90])
print(s)
How to create a dataframe by passing a numpy array?
- d= pd.date_range(‘20200809’,periods=15)
print(d) - df = pd.DataFrame(np.random.randn(15,4), index= d, columns = [‘A’,’B’,’C’,’D’])
print(df)
How to create data frame by passing dictionary of objects?
df1 = pd.DataFrame({‘A’:[1,2,3,4],
‘B’: pd.Timestamp(‘20200809’),
‘C’: pd.Series(1, index= list(range(4)), dtype=’float32'),
‘D’:np.array([5]*4, dtype= ‘int32’),
‘E’:”Lolitha”})
print(df)
How to find datatypes of a data frame?
How to find first five and last five values in the data frame?
df = pd.DataFrame(np.random.randn(15,4), index= d, columns = [‘A’,’B’,’C’,’D’])
print(df)
using df.head() and df.tail()
Finding index and columns
Dataframe by sorting the index
Sorting data by values.
df.sort_values(by=’D’)
How to select single column in a DataFrame?
How to select single column in a data frame?
How to select data using a Labels?
How to select Multi access using labels?
How to slice the rows?
How to get particular values in a data frame?
df.loc[‘20200821’,[‘D’,’C’]]
D -0.008524
C 0.479541
Name: 2020-08-21 00:00:00, dtype: float64How to get scalar Value?
df.loc[d[0],['D','A']]D 0.861121
A -0.063109
Name: 2020-08-09 00:00:00, dtype: float64
Github repository link -Pandas in Python.ipynb- https://github.com/lolithasherley7/lolitha.git
Hope this give basic idea to handle Pandas. Do try it out.