The Trie Data Structure (Prefix Tree)

Aurum Linh
We’ve moved to freeCodeCamp.org/news
5 min readNov 7, 2016

A Trie, (also known as a prefix tree) is a special type of tree used to store associative data structures

A trie (pronounced try) gets its name from retrieval — its structure makes it a stellar matching algorithm.

Context

Write your own shuffle method to randomly shuffle characters in a string. Use the words text file, located at /usr/share/dict/words, and your shuffle method to create an anagram generator that only produces real words.Given a string as a command line argument, print one of its anagrams. 

I was presented with this challenge this week at Make School’s Product Academy.

The words in the text file are separated by new lines. Its formatting makes it a lot easier to put the words into a data structure. For now, I’m storing them in a list — each element being a single word from the file.

One approach to this challenge is to:

  • randomly shuffle the characters in the string
  • then, check it against all words that were in /usr/share/dict/words to verify that it’s a real word.

However, this approach requires that I check that the randomly shuffled characters in the new string matches one of 235,887 words in that file — that means 235,887 operations for each string that I want to verify as a real word.

This was an unacceptable solution for me. I first looked up libraries that had already been implemented to check if words exist in a language, and found pyenchant. I first completed the challenge using the library, in a few lines of code.

def generateAnagram(string, language="en_US"):
languageDict = enchant.Dict(language)
numOfPossibleCombinationsForString = math.factorial(len(string))
for i in range(0, numOfPossibleCombinationsForString):
wordWithShuffledCharacters = shuffleCharactersOf(string)
if languageDict.check(wordWithShuffledCharacters):
return wordWithShuffledCharacters

return "There is no anagram in %s for %s." % (language, string)

Using a couple of library functions in my code was a quick and easy solution. However, I didn’t learn much by finding a library to solve the problem for me.

I was positive that the library wasn’t using the approach I mentioned earlier. I was curious and dug through the source code — I found a trie.

Trie

A trie stores data in “steps”. Each step is a node in the trie.

Storing words is a perfect use case for this kind of tree, since there are a finite amount of letters that can be put together to make a string.

Each step, or node, in a language trie will represent one letter of a word. The steps begin to branch off when the order of the letters diverge from the other words in the trie, or when a word ends.

I created a trie out of directories on my Desktop to visualize stepping down through nodes. This is a trie that contains two words: apple and app.

You can visualize stepping down through nodes in a trie as changing directories.

Note that the end of a word is denoted with a ‘$’. I’m using ‘$’ because it is a unique character that will not be present in any word in any language.

If I were to add the word ‘aperture’ to this trie, I would loop over the letters in the word ‘aperture’ while simultaneously stepping down the nodes in the trie. If the letter exists as a child of the current node, step down into it. If the letter does not exist as a child of the current node, create it and then step down into it. To visualize these steps using my directories:

While stepping down the trie, the first two letters of ‘aperture’ are already present in the trie, so I step down into those nodes.

The third letter, ‘e’, however, is not a child of the ‘p’ node. A new node is created to represent the letter ‘e’, branching off from the other words in the trie. New nodes for the letters that follow are created as well.

To generate a trie from a words file, this process will happen for each word, until all combinations for every word are stored.

You might be thinking: “Wait, won’t it take really long to generate the trie from that text file with 235,887 words in it? What’s the point of looping over every single character in every single word?”

Yes, iterating over each character of every word to generate a trie does take some time. However, the time taken to create the trie is well worth it — because to check if a word exists in the text file, it takes at most, as many operations as the length of the word itself. Much better than the 235,887 operations it was going to take before.

I wrote the simplest version of a trie, using nested dictionaries. This isn’t the most efficient way to implement one, but it is a good exercise to understand the logic behind a trie.

endOfWord = "$"def generateTrieFromWordsArray(words):
root = {}
for word in words:
currentDict = root
for letter in word:
currentDict = currentDict.setdefault(letter, {})
currentDict[endOfWord] = endOfWord
return root
def isWordPresentInTrie(trie, word):
currentDict = trie
for letter in word:
if letter in currentDict:
currentDict = currentDict[letter]
else:
return False
return endOfWord in currentDict

You can see my solution for the anagram generator on my Github. Since exploring this algorithm, I’ve decided to make this blog post one of many — each post covering one algorithm or data structure. The code is available on my Algorithms and Data Structures repo — star it to stay updated!

Next Steps

I suggest checking out Ray Wenderlich’s trie repo. Although written in Swift, it’s a valuable source for explanations of various algorithms.

Similar to the trie (but more memory efficient) is a suffix tree, or radix. In short, instead of storing single characters at every node, the end of a word, its suffix, is stored and the paths are created relatively.

However, a radix is more complicated to implement than a trie. I suggest taking a look at Ray Wenderlich’s radix repo if you’re interested.

This is the first post of my algorithm and data structures series. In each post, I’ll present a problem that can be better solved with an algorithm or data structure to illustrate the algorithm/data structure itself.

Star my algorithms repo on Github and follow me on Twitter if you’d like to follow along!

--

--

Aurum Linh
We’ve moved to freeCodeCamp.org/news

Creatrix of Atlas Lab • How machines make decisions is a human rights issue