Pandas in Python

Lolithasherley
Analytics Vidhya
Published in
3 min readSep 22, 2020

Pandas is used for data manipulation, analysis and cleaning.

What are Data Frames and Series?

Dataframe is a two dimensional, size mutable, potentially heterogeneous tabular data.

It contains rows and columns, arithmetic operations can be applied on both rows and columns. Dataframe contains multiple columns and multiple data type.

Series is a one dimensional label array capable of holding data of any type. It can be integer, float, string, python objects etc. Panda series is nothing but a column in an excel sheet.It generally contains one column and one data type.

There are 3 parts in Panda

  1. Indexing
  2. Columns
  3. Records/Row

Priority of the data type is int, float, string(object)

How to create dataframe and series?

s = pd.Series([1,2,3,4,56,np.nan,7,8,90])
print(s)

How to create a dataframe by passing a numpy array?

  1. d= pd.date_range(‘20200809’,periods=15)
    print(d)
  2. df = pd.DataFrame(np.random.randn(15,4), index= d, columns = [‘A’,’B’,’C’,’D’])
    print(df)

How to create data frame by passing dictionary of objects?

df1 = pd.DataFrame({‘A’:[1,2,3,4],
‘B’: pd.Timestamp(‘20200809’),
‘C’: pd.Series(1, index= list(range(4)), dtype=’float32'),
‘D’:np.array([5]*4, dtype= ‘int32’),
‘E’:”Lolitha”})
print(df)

How to find datatypes of a data frame?

How to find first five and last five values in the data frame?

df = pd.DataFrame(np.random.randn(15,4), index= d, columns = [‘A’,’B’,’C’,’D’])
print(df)

using df.head() and df.tail()

Finding index and columns

Dataframe by sorting the index

Sorting data by values.

df.sort_values(by=’D’)

How to select single column in a DataFrame?

How to select single column in a data frame?

How to select data using a Labels?

How to select Multi access using labels?

How to slice the rows?

How to get particular values in a data frame?
df.loc[‘20200821’,[‘D’,’C’]]
D -0.008524
C 0.479541
Name: 2020-08-21 00:00:00, dtype: float64
How to get scalar Value?
df.loc[d[0],['D','A']]
D 0.861121
A -0.063109
Name: 2020-08-09 00:00:00, dtype: float64

Github repository link -Pandas in Python.ipynb- https://github.com/lolithasherley7/lolitha.git

Hope this give basic idea to handle Pandas. Do try it out.

--

--