The Dictionary Project Part 2: Creating Search Functions to Retrieve Definitions

Anna Chernysheva
5 min readNov 10, 2023

--

In the first part, we converted a DOCX document into a Pandas dataframe and performed the necessary preparations for the subsequent search program. This involved splitting the terms and definitions, as well as adding an auxiliary feature called working_term. The working_term column was created by removing all non-letter characters from the original terms and converting the values to lowercase. The second part of the project is dedicated to creating an effective search program.

Table of contents

  1. Overall Search Function Algorithm
  2. Exact Search Function Algorithm
  3. Defining the Full Search Function

The basic concept of retrieving necessary parts of textual data can be demonstrated through these two pieces of code:

For partial match among overall search:

#Find all strings containing specific word or word part
dictionary[dictionary.working_term.str.contains('cable')==True]

To return the instances including the exact word only:

#Find all strings that are equal to the specific word 
dictionary[dictionary.working_term =='cable']

In order to build a program that will perform these two search methods conditionally, we need to define a set of functions and unite them into one called full_search.

1. Overall Search Function Algorithm

Returning a partial match of the preprocessed dataframe includes the following steps:

Step 1. Capture the search word and make it lowercase.

Step 2. Create the search dataframe by finding all rows in the working_term column that contain the search word.

Step 3. Save the terms and definitions of the search result into lists.

Step 4. Print the search result with highlighted terms in yellow.

Step 5. Print ‘No term found!’ in red if the search result is empty.

#Partial macth
def overall_search(dataframe, search_word):
search_word=search_word.lower()
search=dataframe[dataframe.working_term.str.contains(search_word)==True]
terms=search['term'].to_list()
definitions=search['definition'].to_list()


for i in range(len(search)):
print(f'\033[43m{terms[i]}:\033[0m{definitions[i]}')

if search.empty:
print('\033[38;5;124mNo term found!\033[0m')

2. Exact Search Function Algorithm

Here are the steps to find the exact matched definitions:

Step 1. Capture the search word and make it lowercase.

Step 2. Create the search dataframe by filtering for exact matches.

Step 3. Extract the terms and definitions from the search result and save them in a list.

Step 4. Print the search result with highlighted terms in yellow.

Step 5. Print ‘No term found!’ in red if the search result is empty.

If you want to learn more about how to format your printing outputs check this article.

#Exact match
def exact_search(dataframe, search_word):
search_word=search_word.lower()
search=dataframe[dataframe.working_term==search_word]
term=search['term'].to_list()
definition=search['definition'].to_list()

for i in range(len(search)):
if not search.empty:
print(f'\033[43m{term[i]}:\033[0m{definition[i]}')

if search.empty:
print('\033[38;5;124mNo term found!\033[0m')

3. Defining the Full Search Function

The complexity of this function lies in the necessity of combining the two previously noted functions together in a specific order. First, the exact search is performed. If it is not successful, then the partial or overall search is applied. Finally, if all attempts are in vain, the function should return the standard ‘No term found!’ response.

As we saw above, both the exact_search and overall_search functions included the statement to filter out terms that are not presented in the search dataframe.

However, when concatenating the functions, we need to transfer this step to the last phase, so that the filtering occurs only after the overall_search. That’s why we need to change the code for the exact_search by skipping it. Besides, the output phase will be performed later during the full_search, so printing output is put off, and the result is saved into the zipped list instead.

#Exact match
def exact_search(dataframe, search_word):
search_word=search_word.lower()
search = dataframe[dataframe.term == search_word]
term = search['term'].to_list()
definition = search['definition'].to_list()

if search_word in term:
return list(zip(term, definition))

return None

The same procedure is applied to the overall_search function.

#Partial macth
def overall_search(dataframe, search_word):
search_word=search_word.lower()
search = dataframe[dataframe.term.str.contains(search_word)]
term = search['term'].to_list()
definition = search['definition'].to_list()

return list(zip(term, definition))

Finally, we construct the full_search algorithm:

Step 1. Convert the search word to lowercase.

Step 2. Remove non-letter characters from the search word, this function can be found in the first part.

Step 3. Perform the chain of search actions:

If an exact match is found (i.e., the exact_result is not None), the following steps are executed:

  • Iterate over each term and definition in the exact_result:
  • Print the term and definition in a formatted manner, highlighting the term on a yellow background

If no exact match is found (i.e., the exact_result is None), the following steps are executed:

Perform an overall search:

  • The overall_search() function is called, passing the dataframe and the cleaned search word as arguments. This function searches for the search word within the terms and definitions in the dataframe.

If an overall match is found (i.e., the overall_result is not empty), the following steps are executed:

  • Iterate over each term and definition in the overall_result:
  • Print the term and definition in a formatted manner, highlighting the term in a yellow background.

If no overall match is found (i.e., the overall_result is empty), the following step is executed:

  • Print a message indicating that no terms were found, displayed in a reddish color.
#The Full Search
def full_search(dataframe, search_word):
search_word=search_word.lower()
search_word=remove_non_letters(search_word)
exact_result = exact_search(dataframe, search_word)
if exact_result is None:
overall_result = overall_search(dataframe, search_word)
if overall_result:
for term, definition in overall_result:
print(f'\033[43m{term}:\033[0m{definition}')
else:
print('\033[38;5;124mNo terms found!\033[0m')
else:
for term, definition in exact_result:
print(f'\033[43m{term}:\033[0m{definition}')

To recap, this function performs an exact search first and then falls back to an overall search if no exact match is found. It prints the results in a formatted manner, employing color coding for visual distinction.

The third part of the project will describe the method of establishing the rules to deal with misspelled inputs using fuzzywuzzy library.

--

--

Anna Chernysheva

SEO Analyst and Data Scientist. Specializing in linguistic tasks.