Basic commands in Pandas for Data Science
import pandas as pd
import numpy as np
For practice we will be using DATASET on brazilian_houses_to_rent
Its link given below. Its open to public domain.
https://www.kaggle.com/rubenssjr/brasilian-houses-to-rent
# save file into varible df
df = pd.read_csv(‘C:\\Users\JAY\Desktop\python\datasets.csv’)
print(df)
Unnamed: 0 city area rooms bathroom parking spaces floor \
0 0 1 240 3 3 4 -
1 1 0 64 2 1 1 10
2 2 1 443 5 5 4 3
3 3 1 73 2 2 1 12
4 4 1 19 1 1 0 -
... ... ... ... ... ... ... ...
6075 6075 1 50 2 1 1 2
6076 6076 1 84 2 2 1 16
6077 6077 0 48 1 1 0 13
6078 6078 1 160 3 2 2 -
6079 6079 1 60 2 1 1 4
animal furniture hoa rent amount property tax \
0 acept furnished R$0 R$8,000 R$1,000
1 acept not furnished R$540 R$820 R$122
2 acept furnished R$4,172 R$7,000 R$1,417
3 acept not furnished R$700 R$1,250 R$150
4 not acept not furnished R$0 R$1,200 R$41
... ... ... ... ... ...
6075 acept not furnished R$420 R$1,150 R$0
6076 not acept furnished R$768 R$2,900 R$63
6077 acept not furnished R$250 R$950 R$42
6078 not acept not furnished R$0 R$3,500 R$250
6079 acept furnished R$489 R$1,900 R$0
fire insurance total
0 R$121 R$9,121
1 R$11 R$1,493
2 R$89 R$12,680
3 R$16 R$2,116
4 R$16 R$1,257
... ... ...
6075 R$15 R$1,585
6076 R$37 R$3,768
6077 R$13 R$1,255
6078 R$53 R$3,803
6079 R$25 R$2,414
[6080 rows x 14 columns]
# To get headers of columns in dataprint(df.columns)
Index(['Unnamed: 0', 'city', 'area', 'rooms', 'bathroom', 'parking spaces',
'floor', 'animal', 'furniture', 'hoa', 'rent amount', 'property tax',
'fire insurance', 'total'],
dtype='object')
# To read specific columnprint(df[‘area’])
0 240
1 64
2 443
3 73
4 19
...
6075 50
6076 84
6077 48
6078 160
6079 60
Name: area, Length: 6080, dtype: int64
# you can specify how many rows you want (slicing)print(df[‘area’][0:5])
0 240
1 64
2 443
3 73
4 19
Name: area, dtype: int64
# Get multiple columns (List of labels)print(df[[‘area’,’animal’]])
area animal
0 240 acept
1 64 acept
2 443 acept
3 73 acept
4 19 not acept
... ... ...
6075 50 acept
6076 84 not acept
6077 48 acept
6078 160 not acept
6079 60 acept
# print rows
# print(data.head)
# print 1st 4 rowsprint(df.head(4))
Unnamed: 0 city area rooms bathroom parking spaces floor animal \
0 0 1 240 3 3 4 - acept
1 1 0 64 2 1 1 10 acept
2 2 1 443 5 5 4 3 acept
3 3 1 73 2 2 1 12 acept
furniture hoa rent amount property tax fire insurance total
0 furnished R$0 R$8,000 R$1,000 R$121 R$9,121
1 not furnished R$540 R$820 R$122 R$11 R$1,493
2 furnished R$4,172 R$7,000 R$1,417 R$89 R$12,680
3 not furnished R$700 R$1,250 R$150 R$16 R$2,116
# To access each element
# suppose tax of house on 4th rowprint(df.iloc[4,11])
R$41
# Printing data for each row
for index, row in df.iterrows():
print(index , row)
# Get mathematical stats
The outputs for commands below are not possible to write in medium. You can jupyter notebook to view output
df.describe()
# Sorting by values
df.sort_values(‘city’)
# You can add new column as sum of other columndf[‘Total’] = df[‘hoa’] + df[‘rent amount’] +[‘property tax’] + [“fire insurance”]
# another way to do the samedf[‘Total’] = df.iloc[:, 9:12].sum(axis=1)
# Delete columndf = df.drop(columns=[‘Total’])
# To move a column
# First get columns as listcols = list(df.columns.values)
# Then reset df as you want (we will swap bathroom and room column)df = df[cols[:3]+[cols[4]]+[cols[3]]+cols[5:]]
# To save file
Into CSVdf.to_csv(‘modified_data_version_1.csv’)
Into EXCEL
df.to_excel(‘modified_data_version_1.xlsx’)
Into TXT
df.to_csv(‘modified_data_version_1.txt’, sep='\t')
you have to pass separator to save and visualize properly in txt format.