Basic commands in Pandas for Data Science

Jay Sardar
Analytics Vidhya
Published in
3 min readAug 24, 2020

import pandas as pd
import numpy as np

For practice we will be using DATASET on brazilian_houses_to_rent
Its link given below. Its open to public domain.

https://www.kaggle.com/rubenssjr/brasilian-houses-to-rent

# save file into varible df
df = pd.read_csv(‘C:\\Users\JAY\Desktop\python\datasets.csv’)

print(df)

Unnamed: 0  city  area  rooms  bathroom  parking spaces floor  \
0 0 1 240 3 3 4 -
1 1 0 64 2 1 1 10
2 2 1 443 5 5 4 3
3 3 1 73 2 2 1 12
4 4 1 19 1 1 0 -
... ... ... ... ... ... ... ...
6075 6075 1 50 2 1 1 2
6076 6076 1 84 2 2 1 16
6077 6077 0 48 1 1 0 13
6078 6078 1 160 3 2 2 -
6079 6079 1 60 2 1 1 4

animal furniture hoa rent amount property tax \
0 acept furnished R$0 R$8,000 R$1,000
1 acept not furnished R$540 R$820 R$122
2 acept furnished R$4,172 R$7,000 R$1,417
3 acept not furnished R$700 R$1,250 R$150
4 not acept not furnished R$0 R$1,200 R$41
... ... ... ... ... ...
6075 acept not furnished R$420 R$1,150 R$0
6076 not acept furnished R$768 R$2,900 R$63
6077 acept not furnished R$250 R$950 R$42
6078 not acept not furnished R$0 R$3,500 R$250
6079 acept furnished R$489 R$1,900 R$0

fire insurance total
0 R$121 R$9,121
1 R$11 R$1,493
2 R$89 R$12,680
3 R$16 R$2,116
4 R$16 R$1,257
... ... ...
6075 R$15 R$1,585
6076 R$37 R$3,768
6077 R$13 R$1,255
6078 R$53 R$3,803
6079 R$25 R$2,414

[6080 rows x 14 columns]

# To get headers of columns in data
print(df.columns)

Index(['Unnamed: 0', 'city', 'area', 'rooms', 'bathroom', 'parking spaces',
'floor', 'animal', 'furniture', 'hoa', 'rent amount', 'property tax',
'fire insurance', 'total'],
dtype='object')

# To read specific column
print(df[‘area’])

0       240
1 64
2 443
3 73
4 19
...
6075 50
6076 84
6077 48
6078 160
6079 60
Name: area, Length: 6080, dtype: int64

# you can specify how many rows you want (slicing)
print(df[‘area’][0:5])

0    240
1 64
2 443
3 73
4 19
Name: area, dtype: int64

# Get multiple columns (List of labels)
print(df[[‘area’,’animal’]])

area     animal
0 240 acept
1 64 acept
2 443 acept
3 73 acept
4 19 not acept
... ... ...
6075 50 acept
6076 84 not acept
6077 48 acept
6078 160 not acept
6079 60 acept

# print rows
# print(data.head)
# print 1st 4 rows
print(df.head(4))

Unnamed: 0  city  area  rooms  bathroom  parking spaces floor animal  \
0 0 1 240 3 3 4 - acept
1 1 0 64 2 1 1 10 acept
2 2 1 443 5 5 4 3 acept
3 3 1 73 2 2 1 12 acept

furniture hoa rent amount property tax fire insurance total
0 furnished R$0 R$8,000 R$1,000 R$121 R$9,121
1 not furnished R$540 R$820 R$122 R$11 R$1,493
2 furnished R$4,172 R$7,000 R$1,417 R$89 R$12,680
3 not furnished R$700 R$1,250 R$150 R$16 R$2,116

# To access each element
# suppose tax of house on 4th row
print(df.iloc[4,11])

R$41

# Printing data for each row
for index, row in df.iterrows():
print(index , row)

# Get mathematical stats

The outputs for commands below are not possible to write in medium. You can jupyter notebook to view output

df.describe()

# Sorting by values

df.sort_values(‘city’)

# You can add new column as sum of other column
df[‘Total’] = df[‘hoa’] + df[‘rent amount’] +[‘property tax’] + [“fire insurance”]

# another way to do the same
df[‘Total’] = df.iloc[:, 9:12].sum(axis=1)

# Delete column
df = df.drop(columns=[‘Total’])

# To move a column
# First get columns as list
cols = list(df.columns.values)

# Then reset df as you want (we will swap bathroom and room column)
df = df[cols[:3]+[cols[4]]+[cols[3]]+cols[5:]]

# To save file

Into CSV
df.to_csv(‘modified_data_version_1.csv’)

Into EXCEL

df.to_excel(‘modified_data_version_1.xlsx’)

Into TXT

df.to_csv(‘modified_data_version_1.txt’, sep='\t')

you have to pass separator to save and visualize properly in txt format.

--

--