Pandas Tutorial: Modifying Dataframes
In this tutorial, we will learn about how to modify Pandas dataframes. Three operations are discussed in this tutorial:
- deleting index and columns
- renaming index and columns
- reindexing
First, we need to import the required libraries.
# Importing NumPy module and aliasing as np
import numpy as np# Importing Pandas module and aliasing as pd
import pandas as pd
1.0. Deleting index and columns
We can delete particular index or columns by calling drop()
function. The official documentation of drop()
function can be seen here. We can pass inplace = True
to delete the data in place. drop()
has a parameter called axis
, which is needed to be called to determine whether to drop labels from the index or columns.
- set
axis
to0
orindex
for deleting index - set
axis
to1
orcolumns
for deleting columns
Suppose we have df
dataframe as specified below:
In [1]:# Creating 'df' dataframe
df = pd.DataFrame([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12],
[13, 14, 15, 16], [17, 18, 19, 20]],
columns = ['A', 'B', 'C', 'D'])# Showing 'df'
df
The output:
Out [1]: A B C D
0 1 2 3 4
1 5 6 7 8
2 9 10 11 12
3 13 14 15 16
4 17 18 19 20
Example 1: deleting columns
For instance, we want to delete column A
and C
from df
. We can do that by calling:
In [2]:# Deleting column 'A' and 'C' from 'df'
df.drop(['A', 'C'], axis = 1, inplace = True)# Showing 'df'
df
The output:
Out [2]: B D
0 2 4
1 6 8
2 10 12
3 14 16
4 18 20
Example 2: deleting index
For instance, we want to delete index 1
and 3
from df
. We can do that by calling:
In [3]:# Deleting index 1 and 3 from 'df'
df.drop([1 , 3], axis = 0, inplace = True)# Showing 'df'
df
The output:
Out [3]: B D
0 2 4
2 10 12
4 18 20
2.0. Renaming index and columns
We can alter the index and column names by callingrename()
function. The official documentation of rename()
function can be seen here. We can pass inplace = True
to rename the data in place.
Suppose we have df2
dataframe as specified below:
In [4]:# Creating 'df2' dataframe
df2 = pd.DataFrame([[5, 6, 7, 8], [17, 18, 19, 20],
[13, 14, 15, 16], [1, 2, 3, 4], [9, 10, 11, 12]
columns = ['A', 'B', 'C', 'D'])# Showing 'df2'
df2
The output:
Out [4]: A B C D
0 5 6 7 8
1 17 18 19 20
2 13 14 15 16
3 1 2 3 4
4 9 10 11 12
Example 1: renaming columns
Suppose we want to rename the columns of df2
by the fruit names, we can call the following:
In [5]:# Renaming the columns of 'df2' in place
df2.rename(columns = {'A': 'Apple', 'B': 'Banana', 'C': 'Cherry',
'D': 'Dragon fruit'}, inplace = True)# Showing 'df2'
df2
The output:
Out [5]: Apple Banana Cherry Dragon fruit
0 5 6 7 8
1 17 18 19 20
2 13 14 15 16
3 1 2 3 4
4 9 10 11 12
We can also rename the column names without calling rename()
function, by direct value setting. Suppose we want to rename df2
column names to the initial values. We can call the following:
In [6]:# Renaming the columns of 'df2' by direct value setting
df2.columns = ['A', 'B', 'C', 'D']# Showing 'df2' column names
df2.columns
The output:
Out [6]:Index(['A', 'B', 'C', 'D'], dtype='object')
Example 2: renaming index
Suppose we want the to rename the index of df2
by the string version of the current index name, we call the following:
In [7]:# Renaming the index of 'df2' in place
df2.rename(index = {0: 'zero', 1: 'one', 2: 'two', 3: 'three',
4: 'four'}, inplace = True)# Showing 'df2'
df2
The output:
Out [7]: Apple Banana Cherry Dragon fruit
zero 5 6 7 8
one 17 18 19 20
two 13 14 15 16
three 1 2 3 4
four 9 10 11 12
We can also rename the index names without calling rename()
function, by direct value setting. Suppose we want to rename df2
index names to the initial values. We can call the following:
In [8]:# Renaming the index of 'df2' by direct value setting
df2.index = np.arange(0, 5)# Showing 'df2' index names
df2.index
The output:
Out [8]:RangeIndex(start=0, stop=5, step=1)
3.0. Reindexing
Reindexing changes the row labels and column labels of a dataframe. To reindex means to conform the data to match a given set of labels along a particular axis. We can accomplish two things:
- reindexing for reordering existing data
- reindexing to align with another object
To demonstrate reindexing, we use df3
dataframe as specified below:
In [9]:# Creating 'df3' dataframe
df3 = pd.DataFrame([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12],
[13, 14, 15, 16], [17, 18, 19, 20]],
columns = ['A', 'B', 'C', 'D'])# Showing 'df3'
df3
The output:
Out [9]: A B C D
0 1 2 3 4
1 5 6 7 8
2 9 10 11 12
3 13 14 15 16
4 17 18 19 20
3.1. Reindexing for reordering existing data
reindex()
function is used to reorder the existing data to match a new set of labels. The official documentation of reindex()
function can be found here. Suppose we want to reorder the columns of df3
, we can call the following:
In [10]:# Reordering columns of 'df3'
df3.reindex(columns = [‘B’, ‘D’, ‘C’, ‘A’])
The output:
Out [10]: B D C A
0 2 4 3 1
1 6 8 7 5
2 10 12 11 9
3 14 16 15 13
4 18 20 19 17
We can also exclude particular columns. Suppose we want to exclude column A
and D
, we can call the following:
In [11]:# Excluding column 'A' and 'D'
df3.reindex(columns = ['B', 'C'])
The output:
Out [11]: B C
0 2 3
1 6 7
2 10 11
3 14 15
4 18 19
The same operation can be done for reordering index, but by using index
parameter. Suppose we want to re-arrange index of df3
backward, we can call the following:
In [12]:# Reordering index of 'df3'
df3.reindex(index = np.arange(4, -1, -1))
The output:
Out [12]: A B C D
4 17 18 19 20
3 13 14 15 16
2 9 10 11 12
1 5 6 7 8
0 1 2 3 4
If the label names are not in the index/column names, the values will be NaN
for the entire column/index. For instance:
In [13]:df3.reindex(columns = ['A', 'C', 'B', 'F'])
The output:
Out [13]: A C B F
0 1 3 2 NaN
1 5 7 6 NaN
2 9 11 10 NaN
3 13 15 14 NaN
4 17 19 18 NaN
The column F
is not in df3,
thus it contains NaN
.
3.2. Reindexing to align with another object
You may wish to take an object and reindex its axes to be labeled the same as another object. We can do this by calling reindex_like()
function. The official documentation of reindex_like()
function can be found here.
Suppose we have df4
dataframe as shown below:
In [14]:# Creating 'df4' dataframe
df4 = pd.DataFrame([[5, 6, 7, 8], [1, 2, 3, 4], [13, 14, 15, 16],
[9, 10, 11, 12], [17, 18, 19, 20]],
index = np.arange(4, -1, -1),
columns = ['D', 'B', 'C', 'A'])# Showing 'df4'
df4
The output:
Out [14]: D B C A
4 5 6 7 8
3 1 2 3 4
2 13 14 15 16
1 9 10 11 12
0 17 18 19 20
If we want to reindex df3
like df4
, we can call the following:
In [15]:# Reindexing 'd3' like 'df4'
df3.reindex_like(df4)
The output:
Out [15]: D B C A
4 20 18 19 17
3 16 14 15 13
2 12 10 11 9
1 8 6 7 5
0 4 2 3 1
As you can see above, the arrangement of df3
index and columns follows df4
.
References
This tutorial is created by referring to the following: