Trie Data Structure

Christopher Alvear
smucs
Published in
6 min readApr 5, 2021

Trie is a tree-based data structure used for efficiently retrieving a key from a large set of strings. Trie differs from similar tree-based structures by having the position of it’s nodes determine their associated key. The nodes of a Trie do not hold a key themselves.

Nodes: Each node of the tree contains an array of child pointers(will be named pointers[] for all examples). The size of the array or amount of pointers is determined by the unique number of characters present in the dataset you want to store (lowercase only English — 26 pointers). Each node also contains a flag that specifies if it is the end of a key or not.

Below is a basic Trie diagram with the following keys inserted: an, app, apps, red, read, run.

Diagram of a Trie: nodes containing # have their ending flags set as true

Insertion

Insertion begins by starting at the root node of the tree and walking through the Trie according the string that is being inserted. New nodes are created when a pointer for the next string’s character is null. When we reach the final character of a key being inserted, we create the node as usual but we also set the flag marking the node as the end of a key to true.

Lets walk through the process of adding the keys “app, apps”. Beginning with “app”, we start at the root of the tree that doesn’t represent any character.

We check root’s array of pointers to see if the pointer corresponding to app’s first character (a) points to a existing node.

We see that the node for a exists and was created from the previous insertion of key an.

Proceeding we go to app’s next letter p, we can see there is no node for p at our current node of a so we must create it.

We create the node for p and set the appropriate pointer from a’s array of pointers to point to this newly created node (since p is the 16th letter of the alphabet we would set pointers[15] to point to this new node).

Keep in mind the nodes do not actually contain their respective characters but instead a pointer’s position in the array implies its character (pointers[0] — a, pointers[1] — b,….,pointers[25] — z). We’ll see later how we can search for words in our Trie just by checking for the existence of certain nodes.

Proceeding with app’s third letter, another p, we create a new node and set the appropriate pointer from the previous p’s array of pointers to point to this new node

Since this is the last letter of app, we also set this node’s end flag as true. This completes the insertion of key app, we go through the same process to insert apps.

After both insertions are completed, we get our original Trie diagram.

Below is a snippet of code of what a Trie implementation of insertion would look like in C++.

Note that we subtract 97 from the characters so the ascii values of a-z match up to 0–26.

Searching

Searching for a key is very similar to inserting a key in a Trie. When searching, we traverse the Trie starting from the root and continue through the nodes according to the key’s characters. We can say the key does not exist in the Trie if while traversing through the tree with our key of choice we get a nullptr along our path or the last node we reach with our key does not have its end flag set as true. Likewise if the final node we reach exists and it is marked as an end node, we can say the key exists in the Trie.

Lets begin with searching for a key that doesn’t exist on the tree: rub

Looking at the diagram we can see at we start at the root of the Trie and traverse down according to the key’s characters. We go from root to node r then to node u. When we access the pointer corresponding with b at the current node of u, we see that it is a nullptr not pointing to anything. Since we’ve encountered a nullptr we can say the key does not exist in our Trie and return false for our search

Now lets search for a key that doesn’t lead to a nullptr but still returns false: rea

We can see that while traversing the tree we do not encounter a nullptr but when we reach the end of the key we stop at node that isn’t marked as an end node. This means that although the path exists, the key has not been added to the Trie and thus searching for it returns false.

We can run through the Trie one more time searching for a key that does exist: apps

We are able to traverse the Trie without encountering a nullptr and the last node we visit is marked as an end node. The key has been found and we return true.

Below is an implementation of the search function for the Trie class in the shown in the insertion code

Applications

  • One of the most common uses of a Trie is storing autocomplete dictionaries which are found in various applications such search engines, word processors, and mobile phones. The Trie is used in predicting and suggesting words that the user may be trying to type.
  • Along side predicting it is also handy for correcting. A word process or can use a Trie to quickly check if a user’s typed word exists in its dictionary or not and thus notify the user they may have misspelled a word.
  • Trie is also an efficient alternative to both Binary search trees and hash tables. It’s main advantage is that it’s insert and search time are both O(L) where L represent the length of a single word. This is much faster than BST’s and in the case of hashing, there is no collision handling required with Trie’s.
  • The main drawback of Trie is that it requires more space than BST or hash tables as they require each node to have their respective array of pointers, regardless if they use all the pointers or not.

Further Reading

http://theoryofprogramming.com/2016/11/15/compressed-trie-tree/

References

https://www.techiedelight.com/cpp-implementation-trie-data-structure/ — code in this article was adapted from here.

--

--