Text-PreProcessing- Replacing Synonyms

TejasH MistrY
2 min readApr 7, 2024

--

In this article, we will learn how to replace synonyms in text. For instance, we’ll explore converting “bday” to “Birthday” as an example of this process.

Text-PreProcessing- Replacing Synonyms

Replacing Synonyms is often useful to reduce the vocabulary of a text by replacing words with common synonyms.

By compressing the vocabulary without losing meaning, you can save memory in cases such as frequency analysis and text indexing.

You will need a defined mapping of a word to its synonym. This is a simple controlled vocabulary. We will start by hardcoding the synonyms as a Python dictionary, and then explore other options to store synonym maps.

class WordReplacer:
def __init__(self,word_map):
self.word_map = word_map

def replace(self,word):
return self.word_map.get(word,word)

replacer = WordReplacer({"bday" : "Birthday"})
replacer.replace("bday")
Output:

'Birthday'

The WordReplacer class is simply a class wrapper around a Python dictionary. The replace() method looks up the given word in its word_map dictionary and returns the replacement synonym if it exists. Otherwise, the given word is returned as is.

if you were only using the word_map dictionary, you wouldn’t need the WordReplacer class and could instead call word_map.get() directly. However, WordReplacer can act as a base class for other classes that construct the word_map dictionary from various file formats.

Hardcoding synonyms in a Python dictionary is not a good long-term solution. Two better alternatives are to store the synonyms in a CSV file or in a YAML file. Choose whichever format is easiest for those who maintain your synonym vocabulary.

CSV synonym replacement

The CsvWordReplacer class extends WordReplacer class in order to construct the word_map dictionary from a CSV file.

I have a small dataset in a CSV file for demonstration purposes. You can add more words and their synonyms based on your requirements.

Your CSV file should consist of two columns, where the first column is the word and the second column is the synonym meant to replace it.

import csv

class CsvWordReplacer(WordReplacer):
def __init__(self, fname):
word_map = {}
with open(fname, 'r') as file:
reader = csv.DictReader(file)
for row in reader:
word = row['Word']
synonyms = row['Synonyms']
word_map[word] = synonyms
super().__init__(word_map)

replacer = CsvWordReplacer("Synonyms.csv")

text = replacer.replace("bf")
text1 = replacer.replace("bday")
text2 = replacer.replace("pls")

print(text)
print(text1)
print(text2)
Output:

boyfriend
birthday
please

--

--

TejasH MistrY

Machine learning enthusiast breaking down complex Ml/AI concepts and exploring their real-world impact.