Basic commands in Pandas for Data Science

Jay Sardar

Published in

Analytics Vidhya

3 min readAug 24, 2020

import pandas as pd import numpy as np

For practice we will be using DATASET on brazilian_houses_to_rent
Its link given below. Its open to public domain.

https://www.kaggle.com/rubenssjr/brasilian-houses-to-rent

# save file into varible df
df = pd.read_csv(‘C:\\Users\JAY\Desktop\python\datasets.csv’)

print(df)

Unnamed: 0  city  area  rooms  bathroom  parking spaces floor  \
0              0     1   240      3         3               4     -   
1              1     0    64      2         1               1    10   
2              2     1   443      5         5               4     3   
3              3     1    73      2         2               1    12   
4              4     1    19      1         1               0     -   
...          ...   ...   ...    ...       ...             ...   ...   
6075        6075     1    50      2         1               1     2   
6076        6076     1    84      2         2               1    16   
6077        6077     0    48      1         1               0    13   
6078        6078     1   160      3         2               2     -   
6079        6079     1    60      2         1               1     4   

         animal      furniture      hoa rent amount property tax  \
0         acept      furnished      R$0     R$8,000      R$1,000   
1         acept  not furnished    R$540       R$820        R$122   
2         acept      furnished  R$4,172     R$7,000      R$1,417   
3         acept  not furnished    R$700     R$1,250        R$150   
4     not acept  not furnished      R$0     R$1,200         R$41   
...         ...            ...      ...         ...          ...   
6075      acept  not furnished    R$420     R$1,150          R$0   
6076  not acept      furnished    R$768     R$2,900         R$63   
6077      acept  not furnished    R$250       R$950         R$42   
6078  not acept  not furnished      R$0     R$3,500        R$250   
6079      acept      furnished    R$489     R$1,900          R$0   

     fire insurance     total  
0             R$121   R$9,121  
1              R$11   R$1,493  
2              R$89  R$12,680  
3              R$16   R$2,116  
4              R$16   R$1,257  
...             ...       ...  
6075           R$15   R$1,585  
6076           R$37   R$3,768  
6077           R$13   R$1,255  
6078           R$53   R$3,803  
6079           R$25   R$2,414  

[6080 rows x 14 columns]

# To get headers of columns in data
print(df.columns)

Index(['Unnamed: 0', 'city', 'area', 'rooms', 'bathroom', 'parking spaces',
       'floor', 'animal', 'furniture', 'hoa', 'rent amount', 'property tax',
       'fire insurance', 'total'],
      dtype='object')

# To read specific column
print(df[‘area’])

0       240
1        64
2       443
3        73
4        19
       ... 
6075     50
6076     84
6077     48
6078    160
6079     60
Name: area, Length: 6080, dtype: int64

# you can specify how many rows you want (slicing)
print(df[‘area’][0:5])

0    240
1     64
2    443
3     73
4     19
Name: area, dtype: int64

# Get multiple columns (List of labels)
print(df[[‘area’,’animal’]])

area     animal
0      240      acept
1       64      acept
2      443      acept
3       73      acept
4       19  not acept
...    ...        ...
6075    50      acept
6076    84  not acept
6077    48      acept
6078   160  not acept
6079    60      acept

# print rows
# print(data.head)
# print 1st 4 rows
print(df.head(4))

Unnamed: 0  city  area  rooms  bathroom  parking spaces floor animal  \
0           0     1   240      3         3               4     -  acept   
1           1     0    64      2         1               1    10  acept   
2           2     1   443      5         5               4     3  acept   
3           3     1    73      2         2               1    12  acept   

       furniture      hoa rent amount property tax fire insurance     total  
0      furnished      R$0     R$8,000      R$1,000          R$121   R$9,121  
1  not furnished    R$540       R$820        R$122           R$11   R$1,493  
2      furnished  R$4,172     R$7,000      R$1,417           R$89  R$12,680  
3  not furnished    R$700     R$1,250        R$150           R$16   R$2,116

# To access each element
# suppose tax of house on 4th row
print(df.iloc[4,11])

R$41

# Printing data for each row
for index, row in df.iterrows():
print(index , row)

# Get mathematical stats

The outputs for commands below are not possible to write in medium. You can jupyter notebook to view output

df.describe()

# Sorting by values

df.sort_values(‘city’)

# You can add new column as sum of other column
df[‘Total’] = df[‘hoa’] + df[‘rent amount’] +[‘property tax’] + [“fire insurance”]

# another way to do the same
df[‘Total’] = df.iloc[:, 9:12].sum(axis=1)

# Delete column
df = df.drop(columns=[‘Total’])

# To move a column
# First get columns as list
cols = list(df.columns.values)

# Then reset df as you want (we will swap bathroom and room column)
df = df[cols[:3]+[cols[4]]+[cols[3]]+cols[5:]]

# To save file

Into CSV
df.to_csv(‘modified_data_version_1.csv’)

Into EXCEL

df.to_excel(‘modified_data_version_1.xlsx’)

Into TXT

df.to_csv(‘modified_data_version_1.txt’, sep='\t')

you have to pass separator to save and visualize properly in txt format.

Basic commands in Pandas for Data Science

Written by Jay Sardar