The Dictionary Project Part 3: Adding the Misspelled Input Correction to the Search Function

Anna Chernysheva
4 min readNov 15, 2023

--

The search term with human errors, has an equal likelihood of having a corresponding feedback element as one that is spelled correctly. For instance, if a user inputs cbl considering cable. In the initial two phases of the Dictionary Project, we compiled a dataset from the glossary and developed a search program that retrieves definitions based on user input. The objective of this phase is to incorporate a correction function to obtain a definition even when the input word is misspelled.

Table of Contents

1. Defining the Correct Input Function

2. Integrating the Correct Input Function into the Full Search Function: Iteration 1

3. Integrating the Correct Input Function into the Full Search Function: Iteration 2

To review the previous results, let’s recall what the search consisted of. The full search function specified the sequence of the nested exact search and the overall search functions and established the rule for displaying ‘No term found!’ if there is no corresponding term found for the input.

Below, you can find an example of how this function operates with different types of input words: those that have an exact match in the dictionary, those that don’t have an exact match but are partial matches, and those that are misspelled, resulting in the display of ‘No term found!’.

1. Defining the Correct Input Function

The objective of our correction process is to identify an existing term that has the highest similarity score with the actual input. The algorithm responsible for this process is known as Levenshtein Distance (you can find a detailed review of it here), which serves as the foundation for the process method in the fuzzywuzzy Python library.

from fuzzywuzzy import process

def correct_input(dataframe, search_word):
terms = dataframe['term']
closest_match = process.extract(search_word, terms, limit=1)
filtered_match = closest_match[0][1] > 60

if filtered_match:
if closest_match[0][0][0].lower() == search_word[0].lower():
return closest_match[0][0]
else:
return None
else:
return None

Let’s break down the code into steps:

Step 1. Extract the term column from the dataframe.

Step 2. Use the fuzzywuzzy process method to find the closest match for the input search word among all terms. Set the limit to 1 to return only one result.

Step 3. Filter out only the matches with a similarity score higher than 60.

Step 4. Check if there is a filtered match. If there is, proceed to the next step. Otherwise, return None.

Step 5. Check if the first letters of the closest match term and the search word are identical. If they are, return the closest match term. Otherwise, return None. This step is necessary for the next steps of the search function.

2. Integrating the Correct Input Function into the Full Search Function: Iteration 1

Now that we have the correction function prepared, we can include it in the complete search process to retrieve the definitions for misspelled inputs with a similarity score of over 60% compared to the actual term.

During the initial attempt, we positioned the correction function before both the exact and overall search functions (for more information about the search architecture, please refer to the provided link). However, while testing different input variations, we discovered a problem: if the input’s similarity score is below 60%, the process gets stuck because we initially receive a None value, making further manipulations impossible. Therefore, it was crucial to enhance the function to address this issue.

3. Integrating the Correct Input Function into the Full Search Function: Iteration 2

In the second attempt, we implemented the following architecture: After receiving the search term as input, the function first sends it to the exact search. If the exact search returns None, the function proceeds to the overall search. If the overall search also returns None, the search term undergoes the correction algorithm.

Ultimately, the function produces one of two output versions: the term along with its corresponding definition for successful results (with a similarity score higher than 60%), or ‘No term found!’ in cases where there is no suitable match among the actual terms.

def full_search(dataframe, search_word):
search_word = search_word.lower()
search_word = remove_non_letters(search_word)
exact_result = exact_search(dataframe, search_word)

if exact_result:
for term, definition in exact_result:
print(f'\033[43m{term}:\033[0m {definition}')
return # Exit the function after exact search

overall_result = overall_search(dataframe, search_word)
if overall_result:
for term, definition in overall_result:
print(f'\033[43m{term}:\033[0m {definition}')
return # Exit the function after overall search

corrected_word = correct_input(dataframe, search_word)
if corrected_word:
result = full_search(dataframe, corrected_word)
return # Exit the function after corrected search

print('\033[38;5;124mNo terms found!\033[0m')

Hope, you enjoyed this project! Thank you!

--

--

Anna Chernysheva

SEO Analyst and Data Scientist. Specializing in linguistic tasks.