The Startup
Published in

The Startup

Data Manipulation Using Pandas

source :- Human Resource

1. Load Dataset :

import numpy as np 
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("titanic")

2. Read first/last five rows :

3. Description of each features :

4. Statistical description :

5. Description of Categorical data :

a=(df.dtypes=="object")  
print(a)
b = df.dtypes[a]  # give the output if "a" returns True value 
b.describe()

6. Printing all columns :

7. Null values Operations :

8. Filling null Values :

9. Returning specific index :

import numpy as np
import pandas as pd
a=np.where(df["fare"]==max(df["fare"]))
print("index of maximum fare will be =: ",a)
[out]>> index of maximum fare will be =: (array([136, 152], dtype=int64),)

10. Selecting Specific columns :

a=df["age"]  #Printing specific column age
b= a < 40 #People with age less than 40 years old
b.sum() #total number of people with age less than 40 years old
[out] >> 114

11. Creating new column in given dataframe :

embark_town=df["embark_town"].astype("str")
df["new_column"]=[i[0] for i in embark_town]
df.columns

12. Creating dataframe in matrix format :

from  numpy.random  import randn as rn
np.random.seed(101)
matrix_data=rn(5,4)
row_label=["A","B","C","D","E"]
column_label=["P","Q","R","S"]
df1=pd.DataFrame(matrix_data,row_label,column_label)
df1

13. Dropping rows and columns :

df2=df.drop(“adult_male”,axis=1) #axis=1 represent removal of data #from column and axis=0 represent removal of data from row
print(df2)

14. Index Set and Reset :

df1.reset_index() #Gives default index value but previus index value                          
#will also be present after reseting
df1.reset_index(drop=True,inplace=True) #drop the previous index
#value and assign the default indexnew_default=["k","L","M","N","P"]
df1.set_index("new_default") #setting the index values

15. Multi-indexing :

from numpy.random import randn
np.random.seed(101)
matrix_data=randn(12,6)
list1=["P","P","P","Q","Q","Q","R","R","R","S","S","S"]
list2=[1,2,3,1,2,3,1,2,3,1,2,3]
multi_index=tuple(zip(list1,list2))
print("printing multi index value \n","-"*120,"\n",(multi_index))
Multi_index=pd.MultiIndex.from_tuples(multi_index)
df3=pd.DataFrame(matrix_data,index=multi_index,columns["J","K","L","M","N","O"])
print(df3)
a=df3.loc["P"].loc[[3],["A","C","E"]]
print(a)
[out] >> A C E
3 0.188695 -0.933237 0.190794

16. Groupby Operation :

data={"Company":["GooGle","GooGle","Microsoft","Microsoft","FaceBook","FaceBook"], "Person":["Akhil","Anand","Kspa","Abhishek","shiv","Aashutosh"],"Sales":[100,200,300,400,500,600]}
df6=pd.DataFrame(data)
df6.groupby("Company") #If we only do groupby we won't get output
#for getting output we will have to associate some aggregate #operation along with groupby
print(df6.groupby("Company").mean())
[out]>> Sales
Company
FaceBook 550
GooGle 150
Microsoft 350
df=df6.groupby("Company").describe()
df

17. Accessing Row inside Row and column inside column index :

new_d["Sales"].loc[["GooGle"],["std","mean"]  #Here i have accessed #"GooGle" column which is present inside "Sales" column
df1.loc["Sales"].loc[["count","mean"],["FaceBook","GooGle"]]
#here we are acessing "count" and "means" row index which is place d #inside "Sales" row.

18. Concatenation :

fig:-Horizontal Concatenation
fig:- Vertical Concatenation

19. Merge Operation :

20. Joins :

21. apply() :

22. iloc() and loc() :

Conclusion:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store