Trying out Tries

Published in

smucs

8 min readApr 5, 2021

What is a Trie?

A trie, named based on retrieval, is a search tree that is usually used with strings. Each string is used as a key and each node holds an character that makes up a string.

When Would I Use a Trie?

Tries are not always great for every problem that deals with strings. While having efficient insertion and look up times compared to a Hash Tables and AVL Trees, Tries use up much more memory to store each character and the associated data. So when are tries the most useful?

One senario where trie can be used would be in predictive spell check. A trie can be used to predict the likelyhood that given a set of characters, it will be a specific word. Similarly, tries could also be used for corrective spell check. Though other data structures may be able to store all the string data much more efficiently than tries, implementing either of these would be extremely difficult, if not impossible.

Also, tries are more memory efficient when your data follows a repetitive pattern. One example of this is if you’re storing all word forms of benefit. Because all inserted words (such as beneficiary, beneficial, beneficially, etc.) begin very similarly, there will be less unique nodes and therefore less space would be taken up.

Okay…But How Do They Work?

Say we create a Trie object called t. At this point, we will only have one root node. In the end, we want to store “hello”, “he”, “hi”, “app”, and “i”. To start, we will call t.Insert(“hello”). Then we will check if children[6] (because ‘h’ -‘a’ = 6) is null. In this case, it is, so we insert a new Node with character ‘h’ into the root’s children array. Our trie now will look like this:

We increase our current node to ‘h’, which allows us to look at look a h’s children. We continue this process until all characters in “hello” have been inserted and our trie looks like this:

From there, let’s insert “he” and “hi”. For he we would start at the root and would see that ‘h’ is a child, so ‘h’ would become our new current node. Then we would check if e is a child of our current node, which it is. Because all the letters in our current string are present in the current trie, we would just set node->isWord equal to true. Now for “hi”, we begin at root again as see that ‘h’ is a child of root. However, unlike the ‘e’ in “he”, ‘i’ is not a child of ‘h’. Therefore, we need to add an ‘i’ node to children. This node’s isWord will be set to true because ‘i’ is the last character in hi.

When inserting “app”, we see that root does not have an ‘a’ child node. So we add a new node that holds ‘a’ to root’s children. Then, the rest of app (the characters ‘p’ and ‘p’) will be added, all connected together. Then “i” will also be inserted as a new child node of root.

Now let’s try calling Search(“hi”). We will start by going through GetNode. Starting at ‘h’, we will check if root has an h child node. When we see it does, we will increase the character we are looking for to ‘i’ and increase the node we are looking at to the ‘h’ node. We check if ‘i’ is a child of ‘h’, and when we find it, we will return the ‘i’ node to our Search funtion. We see that our node is not null and isWord is set to true, and therefore will return true.

Let’s compare Search(“h”) and StartsWith(“h”). Both functions will begin the same, by calling GetNode. GetNode will return an ‘h’ node where isWord is set to false. Because StartsWith only checks to make sure the returned node is not null, StartsWith will return true. Search, however, will return false since there is an extra check of whether or not isWord is true.

Finally, let’s look at the most complicated operation, delete. We have to be aware of several different cases when deleting, so let’s look at each of them.

Deleting a word that was never inserted ( ex: t.Delete(“goodbye”))- This is an easy case to handle, but one that our code has to be able to handle. We perform a Search before we begin deleting and if word is not present, we do not have to do anything.
Delete a unique word (ex: t.Delete(“app”))- to delete a unique word means to delete a word that does not hold the last letter of another word in it. To do this we would just delete every node of that word.

3. Delete a word that is a prefix for another word (ex: t.Delete(“he”))- If you have to delete a word that is a prefix for another word, you have to change the last node of the word you want to delete to isWord = false.

4. Delete a word that shares a node with another child (ex: t.Delete(“hello”))- To delete a word that contains a node with multiple children, you delete until you reach that node with more than one children.

How Would I Structure My Trie Class?

Nodes

Each node in our trie needs to hold three pieces of data: the character it’s storing, whether it is the end of a word, and an array to hold the node’s children nodes. This array will at max hold 26 nodes for each letter in the alphabet and will be arranged in alphabetical order as words are. It is important to note that in our implementation, we are assuming that all characters in our string are lowercase.

The node struct also has a function hasChildren that returns true if the node has any children nodes in the children array and returns false if there are no children nodes.

Tries Class

Tries class only includes an empty node pointer root. This root will stay null. After a Trie object is created, it will look like this:

So How Do I Implement It?

For our trie structure, we will use six different functions to perform basic tasks: Insert, Search, StartsWith, Delete, Remove, and GetNode.

void Insert(string word)

The first step to insert a string into our trie is to set a current node equal to our root. Then we get the first character of our string. We can then use ASCII values to see if our character is in the current node’s children array. If the character has not been added, then we add if. After it is added, or if it was already added, we then set our current node equal to our current character in the child array.

char c = word.at(i);

//checks if node has been inserted into children array
//characters will be placed into array in alphabetical order, so can be found by subtracting c's value by a
if(current->children[c - 'a'] == nullptr) {
    current->children[c - 'a'] = new Node(c); //inserts new character into children
}
current = current->children[c - 'a']; //increases position in trie

We loop through this process until we reach the end of our string. The last step is to set the current node isWord equal to true. This allows us to signify the end of one word.

Node* GetNode(string word)

GetNode is called by Search and StartsWith to check if a string is present in the trie. GetNode follows a process very similar to insert, except instead of inserting a character if it is not found in the child array, GetNode returns a null pointer.

if(current->children[c - 'a'] == nullptr) {
    return nullptr;
}

If all characters in the passed-in string are present in the correct order in the trie, GetNode returns the last node.

bool StartsWith(string start)

In StartsWith, we are checking if any words in our trie start with a passed-in string. Our GetNode function does a majority of this work, we just have to check if the returned node is a null pointer or not. If it is not null, we return true, else we return false.

Node* node = GetNode(start);
if(node != nullptr) {
    return true;
}
return false;

bool Search(string word)

Our Search function is very similar to our StartsWith function. However, with Search, we want to find if the passed-in word is specifically stored in trie. For example, if “hello” was the only word inserted into our trie, calling StartsWith(“he”) would return true while Search(“he”) would return false. To achieve this, we just have to add an extra check to see if our node’s isWord is set to true.

Node* node = GetNode(word);
if(node != nullptr && node->isWord) {
    return true;
}
return false;

void Delete(string word)

Remove first calls Search to check if the word has been inserted into the trie. If it has not been inserted, a message is outputted to the user saying that word has not been inserted. If the word has been inserted, then calls Remove, starting at the root and at a depth of 0.

Node* Remove(Node* node, string word, int depth)

Remove is a recursive function that handles deleting words from the trie without losing any data. There are several different situations that we have to check for to ensure no data loss. The first step in our function is to check if we are on the last node of the word or not by checking if the depth is equal to the word length. If we are, we have to set the node’s isWord equal to false, because it is no longer a word. Now we have to check if these characters are present in any other word in our trie. To do this we call hasChildren. If true, we return our current node. If false, then we delete our current node

if(depth == word.length())  {
    node->isWord = false;

    if(!node->hasChildren()) {
        delete(node);
        node = nullptr;
    }
    return node;
}

If not at the end of the word, we continue recursively calling our Remove function.

node->children[index] = Remove(node->children[index], word, depth + 1);

Once the function begins to resolve itself after reaching the end of the word, we begin checking if we have to delete the rest of the characters in the word. to do this, we have to check if our node has children or is the end of a word, both which would indicate that this character is a part of another word. If neither of these is true, it is safe to delete the node and set it equal to nullptr.

if(!node->hasChildren() && !node->isWord) {
    delete node;
    node = nullptr;
}

Resources:

Geeks for Geeks on Tries

Interview Cake Overview on Tries

Video Overview of Trie Implementation by Michael Muinos

Video Trie Deletion Summary by Ayush Garg

Trie Wikipedia Article

Link to Example Code