Python is Awesome

Anagram Finding app with Python Part 1: Algorithm Implementation

Narek Hovsepyan
3 min readFeb 1, 2018

--

What is anagram? An anagram is word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once. For example the word boats can be rearranged into “boast”.

Sometimes it is easy to recover original word from one if its anagrams, but on most cases it will be almost impossible to recover the original one mentally. Just take a look to this example: the anagram “cedknotsu” is rearranged form of “duckstone”. Is there any algorithm for recovering original word from it’s anagram? In this two part tutorial we will build an application, which will take an anagram as an input and show all possible valid anagram words.

Let’s observe those two words “cedknotsu” and “duckstone”. What they have in common? Yes the first thing which is coming in our mind is their length. Anything else? Let’s sort the letters of word “cedknotsu” in alphabetical order. We will get “cdeknostu”. What if we will sort the letters of the word “duckstone”? We will get “cdeknostu” again. So it is trivial that if we will sort the letters of two anagram strings, we will get the same string. This is the key for solution.

The Algorithm

Let’s define the task clearly. We have a string a, we need to find a valid word b, which will be an anagram of the string a.
Let’s describe the algorithm. We have a list of words, let’s sort all of the letters of our words, and keep pairs of sorted letters string and original strings in a hash table. After this preprocessing, we can sort the letters of the string a, then look to our hash table and find it’s corresponding pair.

The Code

We can get the list of all english words from this repository https://github.com/dwyl/english-words. We will download words.txt file, then in the same directory we will create preprocess.py.

Let’s create an empty dictionary

dict = {}

Then we are opening the words.txt file

f = open("words.txt", "r")

We are passing “r” as a second argument, to indicate that it is for read only. Let’s go over all lines inside the file, and sort the letters of each word to store in dict. Note that one anagram can have multiple original valid words, so we will store our dict as a hash table from string to list of strings. We will also preprocess each word by making all the letters lowercase.

for line in f.readlines():
word = line.strip().lower()
sortedLetters = sorted(list(word))
sortedWord = "".join(sortedLetters)
if sortedWord in dict:
dict[sortedWord].append(word)
else:
dict[sortedWord] = [word]

We have finished the preprocessing, so let’s dump our dict to a file, so we can import it in other scripts. We will use python’s pickle module. We will open a file and use pickle.dump() to serialize our dict to a file.The second argument ”wb” to the open() method is statding for “write binary”

import pickle
with open('words.pickle', 'wb') as f:
pickle.dump(dict, f)

Now we are ready to create a script, which will recover the original word by it’s anagram. Let’s create a new fie called solver.py and add following code.

with open('words.pickle', 'rb') as f:
dict = pickle.load(f)

At this point we will have the dictionary ready and we can create the solver function

def sort_letters(a):
return "".join(sorted(list(a)))
def solver(a):
sorted_a = sort_letters(a)
if sorted_a in dict:
return dict[sorted_a]
else:
return []

list(a) will turn the string into list of letters, the sorted() method will return a new sorted list, while “”.join() method is a trick to get a new string from a list of letters. Now the solver(a) method will find the anagrams which are valid words. We can recover the original word of the anagram “cedknotsu”, just by typing this code

solver("cedknotsu")

It will return an array, with two words “duckstone” and “unstocked”

Wrap Up

Hope you enjoyed first part, In the second part we will add GUI
Github: https://github.com/promentol/anagram-pyhton-toga
Youtbe: Coming Soon

--

--