A Dive Into Tries

Tim Olson
smucs
Published in
12 min readApr 7, 2021

A Trie, also known as a prefix tree, is a specific type of search tree optimized for searching for keys in an fast and efficient manner. Most often tries are used to store strings, and unlike other types of trees, the keys inserted into the trie are stored using nodes representing individual characters rather than storing each key in its own node.

Condensed visual representation of a trie

The Anatomy of a Trie

Tries are composed of individual nodes that are connected with pointers, together these components form a tree like structure. Each node consists of an array of pointers that point to its child nodes. In the most common example of storing strings in the trie, the size of the array corresponds to each letter in the alphabet and has a size of 26. The other aspect of a node is a Boolean variable indicating whether or not the node represents the end of a word.

The image below depicts a node in the trie, each element in the array represents a branch to its corresponding letter. For example, the first position in the array would point to a new node corresponding to an ‘a’ in the first position of a string being added or searched.

The node is comprised of pointers corresponding to letters and a flag that indicates the end of a word

An important first step in creating and implementing a trie is to initialize a root node at the “top” of the structure to act as a starting point. All of the pointers in the root node are initialized to null and this node is kept as a special global reference used as a starting position for functions such as insert or search.

The most beneficial aspect of using a trie in place of some other data structure, is the speed at which you can search and insert into the trie. Because of the way the trie is constructed, inserting new keys into the data structure can build off previous insertions, which leads to a much shallower overall structure. Let’s take a look at how insertion works below.

Insertion

The following pseudocode loosely follows the implementation of insertion in C++ and will be a basis for the explanation of the insertion function the follows:

pseudocode for insertion function

In this example we are going to insert the string “cat” into an empty trie. The first step in the process is to create a new node and set that new node equal to our starting position, the root. Next we can start adding each character of the string to the trie. The first character in cat is ‘c’, so we look at the root and check to see if the position in the pointer array corresponding to ‘c’ points to another node. If it is not null, we should traverse to that node, otherwise we need to create a new trie node and then traverse. If we are adding the character ‘c’ to an empty trie, the result would look like the diagram below.

Insertion of the letter c to an empty trie

Notice to the left that all null pointers are filled in with red, and the only non-null pointer for the letter c in the root node is colored green and points to a new node. Note also that the character c in the upper part of the new node is colored red indicating that it is not the end of a string.

Once the first character is inserted, we move on to the second character and repeat this process until the whole string has been added into the trie. In adding a new string to an empty trie will result in the creation of a new node at each new character. An insertion of more strings may not require the same amount of memory allocation if the new strings share the same beginning character(s) as strings that are already in the trie. Inserting the rest of the string “cat” would result in the following structure

Insertion of “cat” to an empty trie, notice the t in the last node is colored green to indicate is the end of the string.

The cool thing about tries is that every key inserted into the trie has a unique path. Unless an identical word is inserted, each string will have its own unique path to the end of the word. In comparison to a hash table, which is another commonly used data structure to store information like this, tries are more efficient in inserting a new element because there is no hash function involved and there are no collisions for when two strings map to the same hash value. In that sense tries are more desirable.

Additionally if we look at the trie compared to a binary search tree, as more and more strings are added to each data structure, the BST will get much deeper in its number of levels. The BST can on have two children, so contain the uniqueness of the value in the tree it will expand downwards quickly to maintain balance. Tries on the other hand stay relatively shallow as more words are added it. Because each node can have 26 different potential pointers to children nodes, the trie is constructed so that it is only as deep as the longest word in the trie. This makes the search function extremely easy and straightforward, let’s take a look at that next.

Searching

In the image below, the strings “car”, “cars”, “cat”, “she”, “shear”, and “sheet” have been added to a trie to use as an example in explaining the functionality of the searching algorithm. This image will be referenced in the next few paragraphs outlining different examples of searching and traversing through a trie.

Below the this diagram is the pseudocode use to implement and execute a search function on the trie. The first step in searching for a key value in the trie is determining the length of the key. This step is necessary for determining the exit condition in the for loop that proceeds, for now we can think about this as determining how many levels we need to traverse in order to determine whether or not the key exists in the trie. Next we create a new trie node that will be used to traverse the trie, just as we did in the insertion algorithm.

Pseudocode for the search algorithm

The next step is entering the for loop and starting to traverse the different paths in the trie. If we are using the example trie shown just above and we are searching for “she”, we would start in the root and determine whether the pointer that corresponds to ‘s’ is null or not. If it is null, then we know that the key does not exist within the trie that we are searching. However, if it is not null, we traverse to the new node in the path by altering the pointer associated with the traversal node that we have created. These steps, check for null pointer and then traverse forward if non-null, continues for the remaining characters in the string. If at any point the trie ends or the traversal_node cannot move onto the next letter, or we have reached the end of one of the possible branches. If we are able to go through every character in the string, then we know the string is present in the trie node represents the end of the word.

If we are looking at more specific examples, lets go back to the diagram above. If we finish the example we start of searching for “she” in the trie, we can look at the example diagram and see that by traversing the trie one character at a time, we can go from ‘s’ to ‘h’ to ‘e’. And once we get to the ‘e’ node in that branch, we know that “she” exists in the trie because the letter is green indicating that this is the end of a word. The nodes visited in searching for
“she” are shown highlighted in yellow in the image below:

If we try searching for the string “shel”, would would be able to traverse through the different levels of the tree without any problems. But when we check to see at the end if we have landed on the end of the word, we would find the letter s in red indicating that we have not found the end of the word and that “shel” is not in the trie.

If we try searching “shes” in the tree, we would traverse from ‘s’ to ‘h’ to ‘e’, but then we would have nowhere to go because there is not a string on this path the continues on to go to ‘s’. The word “shes” is not in the trie.

To see the code that I used to implement my own trie, or to experiment with adding new and different strings to the search or insertion functions, visit the repository I created to see my trie examples in the link at the end of the article.

Additional Applications of Tries

One of the most common applications of the trie is prefix searching. Because of the way the trie is built, it is possible to search for a prefix in the trie and return a list of words that begin with the prefix.

  1. The first step to perform the normal search of the prefix on the trie
  2. If the prefix cannot be reached within a normal call of the search function, then we know that the trie does not contain any words with the indicated prefix. We can end the function here and return
  3. If the prefixed is reached in the search function, determine if the prefix is a word on its own by checking the boolean on the current node
  4. If the prefix has been determined to be a word on its own, print the word
  5. Check to see if the current node has any children. If the node does not have any, then we can end the function and return
  6. If there are children still remaining, call a function to recursively go through and print each of the remaining words to exhaustively list all of the words beginning with the indicated prefix
Diagram outlining the initial steps of searching for words prefixed by “car-”

Using the steps above, we can take the example from earlier and try searching for a specific prefix. Say we wanted to search for all words the start with “car”. The first step is to search for “car”, if we traverse the trie letter by letter, the nodes shown in yellow show which nodes we come across in the initial step. We reach the last character in “car” which is ‘r’ and also the end of a word as indicated by the color green in the diagram. Because this is a word on its own, we can print “car” on its own to begin the list of words containing the prefix. Next we need to check to see if “car” has any further subtrees or children. As shown in the diagram, the trie does not stop at ‘r’ so we can call the second part of the prefix search: the recursive calls to find all of the rest of the words containing the queried prefix. In this case we can see that the only other word is “cars”.

To learn more about the code implementation of the prefix search and its accompanying functions, click on the link to the repository at the end of the article to explore the example code.

Another way we could implement a trie would be to include key-value pairs within the storing of data. In the implementation I have gone over in the examples above and in the test and examples in the linked repository, I have merely been adding keys to the trie so that they can be searched for. If we replace the idea of the end of word Boolean variable and switch it with a value associated with the key, we create a way to store values in a tree-structure that can be searched for and access very quickly by using the unique keys associated with the value, much like a hash table. And so the biggest change in an implementation like this would be a difference to check whether a node is at the end of the word. Nodes that end a word will have an associated value, for example a string key would have a path to a node with a specific accompanying ID. In this case you would check for the presence of a value in the node instead of checking the Boolean variable EndOfWord. The image below outlines how the nodes may look in this application.

Diagram representing the key-value pair implementation of a trie

So what can we use tries prefix searches for? The most common application of this idea of prefix searching is an idea that you probably are very familiar with. The idea of auto-completion used in google search or auto-recommendation used in writing text messages is a great example of prefix searching. In these situations, prefixes are searched for and the queries return words containing the prefix that may be pertinent in what you are trying to write. Another place which tries are commonly used is in navigating dictionaries. The layout of the data structure, with its fast searching properties, allows for searching words in a dictionary to be extremely fast in comparison to binary search trees and hash tables.

We can search tries in a worst case time complexity of O(M*N) where M is the length of the longest string in the trie and N is the number of words in the trie. Compared to BST, or even an optimized AVL Tree which has a time complexity of O(log n), the trie is consistently more efficient at searching for a key. When we take the case of hash tables into account, the search time complexity is nearly O(1), but this is only average case. When data sets get exponentially large, dealing with collisions in hash tables can slow down the search process, whereas tries keep their time complexity of O(M) as the data set gets larger and larger.

While tries have many advantages compared to other similar data structures, it also has a few factors that should definitely be taken into consideration in its implementation. One of its downfalls has to do with the amount of space it tries take up, for each and every node that is created in the trie, the new node must allocate 26 pointers for the 26 different letters of the alphabet. And while the trie is still relatively small, there are a lot of unused pointers and wasted space. In larger terms, as the data set gets exponentially large, more of the space is used up and memory is allocated more effectively, but tries will always more memory allocation that other data structures like it.

And at the end of the day, we must think about one question: is a data structure that takes up more space in memory worth it because of the faster speeds of insertion and searching? I hope that this article helped you understand how tries operate so that you may answer that question on your own. Click on the link below to go check out the GitHub repository with examples that mirror the diagrams I included above as well as addition tests and experimentation:

Further Readings

Applications of Tries with Radix Sort, Brilliant Learning — https://brilliant.org/wiki/radix-sort/

Advanced Applications of Tries, Open Genius — https://iq.opengenus.org/applications-of-trie/

Works Cited

Tries and Auto-completion, Geeks for Geeks — https://www.geeksforgeeks.org/auto-complete-feature-using-trie/

Advantages of Trie, Geeks for Geeks — https://www.geeksforgeeks.org/advantages-trie-data-structure/

Applications of Trie, Brilliant Learning — https://brilliant.org/wiki/tries/#applications

Trie | (Insert and Search), Geeks for Geeks — (Inhttps://www.geeksforgeeks.org/trie-insert-and-search/

Tries , Harvard CS50— https://www.youtube.com/watch?v=TRg9DQFu0kUhttps://

Tries Implementation, Techie Delight — www.techiedelight.com/memory-efficient-trie-implementation-using-map-insert-search-delete/

--

--