Understanding Trie Data Structure

Published in

Guidona

9 min readJan 25, 2021

Introduction

You must have used the dictionary to look up the meaning of a word. You may not know how to use a dictionary when you first saw it but then you realized its importance and you learnt how easy it is to search for a word among billions and trillions of words in this world in just minutes.

You must have played the game “word puzzle” in your childhood where you have to find meaningful words from a large set of alphabets that have been given to you. How you travelled all the way to board searching for the longest word to score more points.

So, to do this humans have set some of their strategies to make their life easy whether it is the case of finding a word from a dictionary or to win a word puzzle game.

To solve these types of real world problems there are many algorithms and data structures designed to make our computers more efficient and optimized in terms of space and time.

But different data structures serve different purposes.

As a programmer one of the most important topics is string handling and processing. Many real time applications are based on string processing for example: A keyboard, it uses data structure to show you word suggestions as you type along and to autocomplete the word you are typing thus saving time and spell-checking. This particular data structure is known as TRIE.

What is Trie?

There are data structures that are built on the basis of another data structure that we are already familiar with, so these types of data structures are known as Advanced Data Structure.

Trie is an advanced data structure. It is a tree-like data structure also known as digital tree or prefix tree. It is used to store a collection of strings. If we have thousands or hundreds of thousands of strings we can store it in trie data structure.

It consists of nodes which store the letters of a word.
Each node consists of a maximum of 26 children and edges connect each parent node to its children.
Every trie has an empty root node.
Strings are stored in a top to bottom manner on the basis of their prefix in a trie.
All prefixes of length 1 are stored at until level 1.
All prefixes of length 2 are stored at until level 2 and so on.

Working

We will see its implementation through an example:

If we look at our trie we can see that- We have an empty root node. We have 6 different words that we are representing in this trie i.e. Stand, Stroke, Strong, Strike, String, Strings. There are 6 different branches to this trie one for each word. We can also see that the words are sharing parent nodes for example in the all the six words we have a common node “S” and “T”. Similarly, the path to word strong and stroke share the nodes “S”, “T”, “R”, “O”. So if we want to add a new word for example STROLL to this trie:

(A) We will first search for the word using these few simple steps -

It will start with the root node and the prefix to search for.
It takes one character at a time from the prefix and searches through the children of the “current node” (at the beginning which points to the root node) to find a node containing that character.
If found, it reassigns that child node as the “current node” (which means in the next iteration step it will start from here)
If not found then it returns False to indicate that the prefix does not exist.
So in this example, it will start with the root node and the prefix to search for is “S”. It will search through the children of root node. So “S” is a child of root node so it is found, then “S” becomes the current node. This way “S”, “T”, “R”, “O” is found. Now our current node is “O” and we will search for the alphabet “L”. Since, “L” is not a child of “O” it will return string not found.

(B) Inserting the string.

Every character of input key is inserted as an individual Trie node.
If the input key is new or an extension of existing key, we need to construct non-existing nodes of the key, and mark end of word for last node. If the input key is prefix of existing key in Trie, we simply mark the last node of key as end of word.
The key length determines Trie depth.

So in the above example we can see that the alphabets “S”, “T”, “R”, “O” already exists in the trie so we will start with inserting “L”, “O” will be our current node and we will construct a new child of node “O” i.e., “L” similarly now our current node will be “L” and we will create a child of node “L” i.e. “L”. Since this is the last node we will mark the node as end of the word. This way our new word i.e. “Stroll” is added in the trie.

So, here we have discussed two operations on trie i.e., Searching for a word and insertion of new word in the trie. We can perform more operations on trie depending upon the need like deletion of a node.

Python Implementation of Trie

Let’s dig into the implementation part of the trie , after all what’s the importance of mugging the theory if we can’t implement it in real life.

Creating our TrieNode class

The first thing we need is a node, after all this is a tree implementation, to store each character that occurs in our word . So we will create our TrieNode class which will have 3 fields :

1. Character — to store corresponding characters of our word.

2. List of child nodes — pointer to store node of child characters.

3. Boolean — to mark that the current character marks the end of a searched word or not.

class TrieNode:#init function accepts one parameter, the character to insert
    def __init__(self, char):
 
        self.char= char
 
        self.is_end = False
 
        self.children = {}

Here, while initializing the TrieNode class we need to provide a char that we want to insert or have a search suggestion for. is_end is a Boolean variable to mark the end of the current word.

Initializing our Trie

Now the fun part of trie begins, here we implement the insert and search functions of trie.

Insert will add new nodes of characters of a fresh word and search will help to retrieve any word already inserted or show suggestions accordingly.

class Trie(object):
 
    def __init__(self):
 
        self.root = TrieNode("")

The above code will create a parent root node which will be empty containing no characters. This will be our starting point as it will start pointing to all 26 alphabets in English dictionary as soon as they occur.

Insertions in our Trie

To make an insertion we need to traverse the word to be inserted character by character.

Implementation goes like this, we traverse the word, and simultaneously we check that the children of current has that character or not, if present just move down to next node else we create a new TrieNode with the current character and add that to the list of child of the node we are currently present on.

We repeat the above procedure until we reach the end of the word we are inserting, then we make the is_end Boolean true marking the end of the word we were inserting.

Sounds simple, right? Below is the code implementation of the above.

def insert(self, word):
 
        node = self.root
 
#traverse the word character by character        for char in word:#check if the character is there in the list of children
            if char in node.children:
                node = node.children[char]            else:
                new_node = TrieNode(char)
 
# add the new node to the list of children
 
                node.children[char] = new_node
                node = new_node#after traversing the word set .is_end to true for the last #char
        node.is_end = True

Search in Trie

By searching we generally look for the exact match of the word , but here we are looking for a list of matching words that start with the word we searched for.

Say we type a string in search bar , this string will be passed to the search function as a parameter and the search function will return the list of matching characters, simple.

We will take help of DFS in doing so, so we will write a DFS function which goes like ,

def dfs(self, node, pre):
 
       if node.is_end:
           self.output.append((pre + node.char))
        
       for child in node.children.values():
           self.dfs(child, pre + node.char)

The function asks for 2 parameters, node prefix of the string searched so far. If the current node marks the end of a word, it will be appended to the output list, else the function will keep on traversing the corresponding children.

The search function goes like this,

def search(self, x):
        
        node = self.root
# traverse the search query and move down the trie     
        for char in x:
            if char in node.children:
                node = node.children[char]
            else:
              #if query doesn't match the nodes in trie
                return []
         
        self.output = []
#call DFS
        self.dfs(node, x[:-1])
 
        return self.output

First it will traverse down to the node if the last character of the searched string, if the string not present will simply return the empty list, else it will call the DFS function on the searched string to get all the words having prefix as the searched string.

Complete Code

The complete implementation of trie is given below:

class TrieNode:
 
    def __init__(self, char):
 
        self.char = char
 
        self.is_end = False
 
        self.children = {}
 
class Trie(object):
 
    def __init__(self):
 
        self.root = TrieNode("")
     
    def insert(self, word):
 
        node = self.root
 
        for char in word:
            if char in node.children:
                node = node.children[char]
            else:
 
                new_node = TrieNode(char)
                node.children[char] = new_node
                node = new_node
         
        node.is_end = True
         
    def dfs(self, node, pre):
 
        if node.is_end:
            self.output.append((pre + node.char))
         
        for child in node.children.values():
            self.dfs(child, pre + node.char)
         
    def search(self, x):
        
        node = self.root
         
        for char in x:
            if char in node.children:
                node = node.children[char]
            else:
                return []
         
        self.output = []
        self.dfs(node, x[:-1])
 
        return self.output

Let’s try inserting words in our tree and make a search query:

obj1 = Trie()
obj1.insert("strong")
obj1.insert("string")
obj1.insert("strike")
obj1.insert("stroke")
obj1.insert("strings")
obj1.insert("stand")

The above code will create a trie “obj1” and insert the words to it.

We can now make a search query using following code:

obj1.search("string")

OUTPUT :

['string', 'strings']

The trie for the above code will look like :

Time Complexity

The complexity to make a trie structure is O(n*m), where m is the length of the longest word (worst case complexity), and n is the number of words in the trie. This is how: Every time you traverse a string and add it to the existing structure, you perform a few operations like initializing. This is however, a constant time operation. It takes O(1) to create a new node. It is not necessarily O(1) but O(C) where C is a constant.
This is repeated in the worst case for all the strings with length m and hence the complexity of the trie is O(n*m).

Applications of trie

We’ve all used autocomplete on our cellphones. Well, that’s maybe the most common application of the trie. Once you’ve typed in a letter, the tree of potential words is greatly reduced, allowing the program to easily enumerate what kinds of strings are possible.

Google search uses the power of trie to give search suggestions to the user, you type in a letter or a word it will give suggestions of words or sentence respectively in accordance to the searched string. Now in real life , the search suggestions algorithm is much more complicated , uses external factors, more enhancements , suggestions based on recent searches, popularity, similar fields and much more, but the basic foundation used is trie.

Conclusion

This documentation on trie would help you understand the basics of trie data structure with proper illustrations for better understanding and it’s python implementation.