Replace values in Pandas DataFrame

Replace values in Pandas DataFrame

Punyakeerthi BL
3 min readJun 24, 2024

--

Before proceeding with this article, please read the following for continuation:

Handling Missing Data in Pandas Dataframes

Introduction

pandas, a powerful Python library for data analysis and manipulation, provides the replace() function for efficiently modifying values within a DataFrame or Series. It offers a flexible approach to substituting specific values, patterns, or ranges with alternative values.

Definition and Syntax

 dataframe.replace(to_replace, value, inplace=False, limit=None, regex=False)
  • to_replace: This parameter specifies the value(s) or pattern(s) you want to replace. It can be a single value, a list of values, a dictionary mapping values to replacements, a regular expression, or a callable function.
  • value: This parameter defines the replacement value to be used. It can be a single value, a dictionary mapping original values to new values (corresponding to the keys in to_replace if it's a dictionary), or a callable function that determines the replacement based on the original value.
  • inplace (default: False): This optional boolean flag determines whether to modify the original DataFrame (True) or return a new DataFrame with the replacements (False).
  • limit (default: None): This optional parameter specifies the maximum number of replacements to be performed within each column. It can be useful to control the extent of changes when dealing with potentially repetitive patterns.
  • regex (default: False): This optional boolean flag indicates whether to_replace should be interpreted as a regular expression for pattern matching.

Understanding When and Why to Use replace()

Here are some common scenarios where replace() comes in handy:

  • Correcting typos or inconsistencies: Standardize data entries by replacing misspelled values or variations with a consistent format.
  • Cleaning missing data: Substitute missing values (e.g., NaN, None) with predetermined values.
  • Encoding categorical data: Convert string-based categories into numerical codes for further analysis.
  • Conditional replacements: Apply custom logic using a callable function to replace values based on specific criteria.
  • Regular expression-based replacements: Perform advanced pattern matching and replacements using regular expressions.

Examples with Explanations and Outputs

  1. Replacing a Single Value:
import pandas as pd

data = {'Column1': [1, 2, 3, 2, 5]}
df = pd.DataFrame(data)

replaced_df = df.replace(2, 10) # Replace all occurrences of 2 with 10
print(replaced_df)

Output:

Column1
0 1
1 10
2 3
3 10
4 5

2. Replacing Multiple Values:

replaced_df = df.replace([1, 2], 0)  # Replace both 1 and 2 with 0
print(replaced_df)
Output:

Column1
0 0
1 0
2 3
3 0
4 5

3. Replacing Using a Dictionary:

replaced_df = df.replace({1: 20, 2: 30})  # Replace 1 with 20 and 2 with 30
print(replaced_df)
Output:

Column1
0 20
1 30
2 3
3 30
4 5

4. Replacing with a Function:

def square(x):
return x * x

replaced_df = df.replace(to_replace=3, method=square) # Replace 3 with its square
print(replaced_df)
Output:

Column1
0 1
1 4
2 9
3 4
4 5

5. Replacing Using Regular Expressions:

data = {'Column1': ['apple', 'banana', 'orange', 'grapes']}
df = pd.DataFrame(data)

replaced_df = df.replace(to_replace=r'a.ple$', value='fruit', regex=True) # Replace 'apple' with 'fruit' using regex
print(replaced_df)
Output:

Column1
0 fruit
1 banana
2 orange
3 grapes

In-place Modification:

df.replace(2, 10, inplace=True)  # Modifies the original DataFrame
print(df)

Output:

Column1
0 1
1 10

If you like this post please follow me on Linked In: Punyakeerthi BL

--

--

Punyakeerthi BL

Passionate Learner in #GenerativeAI|Python| Micro-Service |Springboot | #GenerativeAILearning Talks about #GenerativeAI,#promptengineer, #Microservices