Pandas Tutorial: Modifying Dataframes

Andika Rachman
6 min readJan 16, 2019

--

Photo by Markus Spiske on Unsplash

In this tutorial, we will learn about how to modify Pandas dataframes. Three operations are discussed in this tutorial:

  • deleting index and columns
  • renaming index and columns
  • reindexing

First, we need to import the required libraries.

# Importing NumPy module and aliasing as np
import numpy as np
# Importing Pandas module and aliasing as pd
import pandas as pd

1.0. Deleting index and columns

We can delete particular index or columns by calling drop() function. The official documentation of drop() function can be seen here. We can pass inplace = True to delete the data in place. drop() has a parameter called axis , which is needed to be called to determine whether to drop labels from the index or columns.

  • set axis to 0 or index for deleting index
  • set axis to 1 or columns for deleting columns

Suppose we have df dataframe as specified below:

In [1]:# Creating 'df' dataframe
df = pd.DataFrame([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12],
[13, 14, 15, 16], [17, 18, 19, 20]],
columns = ['A', 'B', 'C', 'D'])
# Showing 'df'
df

The output:

Out [1]:    A   B   C   D
0 1 2 3 4
1 5 6 7 8
2 9 10 11 12
3 13 14 15 16
4 17 18 19 20

Example 1: deleting columns

For instance, we want to delete column A and C from df. We can do that by calling:

In [2]:# Deleting column 'A' and 'C' from 'df'
df.drop(['A', 'C'], axis = 1, inplace = True)
# Showing 'df'
df

The output:

Out [2]:    B   D
0 2 4
1 6 8
2 10 12
3 14 16
4 18 20

Example 2: deleting index

For instance, we want to delete index 1 and 3 from df. We can do that by calling:

In [3]:# Deleting index 1 and 3 from 'df' 
df.drop([1 , 3], axis = 0, inplace = True)
# Showing 'df'
df

The output:

Out [3]:    B   D
0 2 4
2 10 12
4 18 20

2.0. Renaming index and columns

We can alter the index and column names by callingrename() function. The official documentation of rename() function can be seen here. We can pass inplace = True to rename the data in place.

Suppose we have df2 dataframe as specified below:

In [4]:# Creating 'df2' dataframe
df2 = pd.DataFrame([[5, 6, 7, 8], [17, 18, 19, 20],
[13, 14, 15, 16], [1, 2, 3, 4], [9, 10, 11, 12]
columns = ['A', 'B', 'C', 'D'])
# Showing 'df2'
df2

The output:

Out [4]:    A   B   C   D
0 5 6 7 8
1 17 18 19 20
2 13 14 15 16
3 1 2 3 4
4 9 10 11 12

Example 1: renaming columns

Suppose we want to rename the columns of df2 by the fruit names, we can call the following:

In [5]:# Renaming the columns of 'df2' in place
df2.rename(columns = {'A': 'Apple', 'B': 'Banana', 'C': 'Cherry',
'D': 'Dragon fruit'}, inplace = True)
# Showing 'df2'
df2

The output:

Out [5]:    Apple  Banana   Cherry   Dragon fruit
0 5 6 7 8
1 17 18 19 20
2 13 14 15 16
3 1 2 3 4
4 9 10 11 12

We can also rename the column names without calling rename() function, by direct value setting. Suppose we want to rename df2 column names to the initial values. We can call the following:

In [6]:# Renaming the columns of 'df2' by direct value setting
df2.columns = ['A', 'B', 'C', 'D']
# Showing 'df2' column names
df2.columns

The output:

Out [6]:Index(['A', 'B', 'C', 'D'], dtype='object')

Example 2: renaming index

Suppose we want the to rename the index of df2 by the string version of the current index name, we call the following:

In [7]:# Renaming the index of 'df2' in place
df2.rename(index = {0: 'zero', 1: 'one', 2: 'two', 3: 'three',
4: 'four'}, inplace = True)
# Showing 'df2'
df2

The output:

Out [7]:       Apple  Banana   Cherry   Dragon fruit
zero 5 6 7 8
one 17 18 19 20
two 13 14 15 16
three 1 2 3 4
four 9 10 11 12

We can also rename the index names without calling rename() function, by direct value setting. Suppose we want to rename df2 index names to the initial values. We can call the following:

In [8]:# Renaming the index of 'df2' by direct value setting
df2.index = np.arange(0, 5)
# Showing 'df2' index names
df2.index

The output:

Out [8]:RangeIndex(start=0, stop=5, step=1)

3.0. Reindexing

Reindexing changes the row labels and column labels of a dataframe. To reindex means to conform the data to match a given set of labels along a particular axis. We can accomplish two things:

  • reindexing for reordering existing data
  • reindexing to align with another object

To demonstrate reindexing, we use df3 dataframe as specified below:

In [9]:# Creating 'df3' dataframe
df3 = pd.DataFrame([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12],
[13, 14, 15, 16], [17, 18, 19, 20]],
columns = ['A', 'B', 'C', 'D'])
# Showing 'df3'
df3

The output:

Out [9]:    A   B   C   D
0 1 2 3 4
1 5 6 7 8
2 9 10 11 12
3 13 14 15 16
4 17 18 19 20

3.1. Reindexing for reordering existing data

reindex() function is used to reorder the existing data to match a new set of labels. The official documentation of reindex() function can be found here. Suppose we want to reorder the columns of df3, we can call the following:

In [10]:# Reordering columns of 'df3'
df3.reindex(columns = [‘B’, ‘D’, ‘C’, ‘A’])

The output:

Out [10]:    B   D   C   A
0 2 4 3 1
1 6 8 7 5
2 10 12 11 9
3 14 16 15 13
4 18 20 19 17

We can also exclude particular columns. Suppose we want to exclude column A and D , we can call the following:

In [11]:# Excluding column 'A' and 'D'
df3.reindex(columns = ['B', 'C'])

The output:

Out [11]:    B    C   
0 2 3
1 6 7
2 10 11
3 14 15
4 18 19

The same operation can be done for reordering index, but by using index parameter. Suppose we want to re-arrange index of df3 backward, we can call the following:

In [12]:# Reordering index of 'df3'
df3.reindex(index = np.arange(4, -1, -1))

The output:

Out [12]:    A   B   C   D
4 17 18 19 20
3 13 14 15 16
2 9 10 11 12
1 5 6 7 8
0 1 2 3 4

If the label names are not in the index/column names, the values will be NaN for the entire column/index. For instance:

In [13]:df3.reindex(columns = ['A', 'C', 'B', 'F'])

The output:

Out [13]:    A   C   B    F
0 1 3 2 NaN
1 5 7 6 NaN
2 9 11 10 NaN
3 13 15 14 NaN
4 17 19 18 NaN

The column F is not in df3, thus it contains NaN.

3.2. Reindexing to align with another object

You may wish to take an object and reindex its axes to be labeled the same as another object. We can do this by calling reindex_like() function. The official documentation of reindex_like() function can be found here.

Suppose we have df4 dataframe as shown below:

In [14]:# Creating 'df4' dataframe
df4 = pd.DataFrame([[5, 6, 7, 8], [1, 2, 3, 4], [13, 14, 15, 16],
[9, 10, 11, 12], [17, 18, 19, 20]],
index = np.arange(4, -1, -1),
columns = ['D', 'B', 'C', 'A'])
# Showing 'df4'
df4

The output:

Out [14]:    D   B   C   A
4 5 6 7 8
3 1 2 3 4
2 13 14 15 16
1 9 10 11 12
0 17 18 19 20

If we want to reindex df3 like df4, we can call the following:

In [15]:# Reindexing 'd3' like 'df4'
df3.reindex_like(df4)

The output:

Out [15]:    D   B   C   A
4 20 18 19 17
3 16 14 15 13
2 12 10 11 9
1 8 6 7 5
0 4 2 3 1

As you can see above, the arrangement of df3 index and columns follows df4 .

References

This tutorial is created by referring to the following:

  • Pandas official documentation for [Link]
  • Tutorialspoint [Link]

--

--

Andika Rachman

PhD in Applied AI | Computer Vision & Machine Learning Engineer