Trie Data Structure, friend of auto-correct

Conner Morton
smucs
Published in
4 min readApr 7, 2021

A trie(pronounced try) is a search-tree-like data structure where each of the nodes represent a single letter of a word. The root node is empty and the children of the root represent the first letter of each prefix, for example, the anti prefix meaning something against (antidote meaning medicine against bacteria, antisocial meaning shunning contact with others). The trie for the anti- prefix would look something like this:

An example of a trie structure for antidote, antisocial, and antithesis
An example Trie diagram for antidote, antisocial, and antithesis

The above image is a shortened version of a trie, called a prefix tree, where if a word is complete it combines all remaining letters into one node. The first steps would be to alphabetically place the children “anti,” this takes up a lot of space because each node has 26 pointer children, if we are using the English alphabet.

Origin

Trie memory for computer searching was first recommended by René de la Briandais. He pointed out that we can save memory space at the expense of running time if we use a linked list for each node vector, since most of the entries in the vectors tend to be empty.

Originally, the Trie data structure was created in order to find a compromise between memory usage and speed.

Trie compared to other data structures

An easy way to understand the origins of a Trie is to look at the differences between Hash Table, a more popular data structure, and A trie. A Hash table is boiled down to an array of linked lists, while a trie can be boiled down to an array of pointers. This unfortunately means that memory and space usage will increase due to null pointers. Compared to Hash Tables implementation, a trie has no use for a Hash function, because the keys are stored alphabetically. Long unique words without many prefixes force a trie to perform relativley poor, while short words with a very common prefix would be very beneficial because it would have less null pointers. As the size of the Trie increases and more prefixes are generated, there is less work to be done with both insertion and searching the tree because there is less node generation.

Big O Notation

Because the efficiency of the time to create a trie is dependent on the number of words or keys in the trie, n, and the longest possible word, l, that means that some combination of n and m must make up the worst-case runtime. These two values together comes as O(ln).

When inserting, searching, or deleting a word, the efficiency depends on the word length, k, that the user is searching for meaning that it would be O(an).

Comparing this to a popular rival, AVL Tree, a self-balancing Binary Search tree, with O(NlogN) where N is the number of elements, a trie clearly wins for storing a dictionary and searching for a word in that dictionary.

Another example to show how the efficiency increases as more words are added

Psuedocode Implementation:

Where can you see a trie in use today?

Tries are exceptional at creating a dictionary of words and then searching that dictionary. The most obvious solution would be an auto-complete engine.

Example of auto complete engine

While Google definitely uses more complex implementations of a trie, in this example, you can see how as I add more words to my search, current pointer of the trie is getting lower and lower down to the leafs as I become more specific with my search. If I just type “re” then I get reece’s buttercups which is definitely not related to react development, however the trie was still working as anticipated. Each time we type a letter, and each level we go down in the tree, we are cutting out a large portion of work that the search engine would need to do.

Other popular implementations besides a dictionary for things like auto correct, text search and prefix matching, include being the base for a radix tree and radix sort.

References

Stanford CS166

Space-Efficient Data Structures for Top-k Completion, Bo-June (Paul) Hsu

Further Reading

Fredkin, Edward. “Trie memory.” Communications of the ACM 3, no. 9 (1960): 490–499.

Check out my Github Example in C++!

Link

--

--