The Trie Data Structure

Josiah Castillo
smucs
Published in
6 min readApr 7, 2021

Generally when we think of ways to handle large data sets things like Binary Trees and Hash Tables come to mind. While these are widely applicable and sufficient lots of the time, there are other data structures that can outperform them in niche cases. Tries are an example of this. Tries are trees used primarily to store strings. The special characteristic of this data structure is the insert and delete operations both function at O(L) time. L in this case being the length of the string. This a faster performance than both AVL Trees, which have an insertion and deletion time complexity of O(L * log(n)), and Hash Tables, who’s hash function alone has a time complexity of O(L) without diving into the problems of collision. The main reason for this difference in performance is the way Tries handle collision and branching.

Insertion and Deletion

In a Trie, insertion and deletion have an edge in performance because they do not have to make any computations, balances, or handle collisions. To understand why we’ll take a look at a small example.

Blue indicates ending of a complete word. Red is the opposite.

In the Trie to the left, the words “Hello”, “Helps”, and “Hand” have been inserted. Each character in a Trie has its own node, and all nodes have max number of children depending on the kind of input the Trie will be receiving. In this example, upper-case letters are the input so each node has a max of 26 children. If inputs were codes that included a wider range of alpha numeric values this would change. Tries place a heavy emphasis on the reuse of nodes to hold multiple entries (“H-E-L” both hold “HELLO” and “HELPS”). This basically removes the need for balancing and collision handling entirely. This also means that when inserting, it is important to be able to trace and reuse existing nodes while being able to branch off whenever the sequence of characters change, otherwise this will use a heavy amount of space in a program. Deletion on the other hand needs to be able to recognize when to remove a node in its entirety or to simply remove a child otherwise entire subsets of words could be unintentionally deleted. Lets take a look at what happens when we delete the word “HELPS” from the Trie and insert the word “He”.

Unshared nodes for HELPS have been deleted and the node for HE was changed.

When the delete operator is called on the Trie, the Trie is traversed until it reaches the end of the input it was given. At this point it will check to see if the node is considered a complete word. In the images used, blue indicates a node is flagged as a complete word, while red is the opposite. If it is a complete word, the program will iterate backwards and delete all unique nodes (nodes with no other children) until it reaches the root of the Trie. During the deletion of “HELPS”, only the unique nodes “P-S” were deleted from the Trie. If one were to search for the word afterwards, the search would end at the first “L” and return false. When the insert operator is called on a Trie, the program will search for every individual character in the word until the end of the word is reached. If the character being checked for doesn’t exist in the Trie, it will simply insert a new node wherever the branch is. At the end of this process the final node in the sequence will be flagged as a complete word. In this example, since “HE” already exists within the Trie, no new nodes were required and the final node in “HE” was simply changed to indicate it now exists within the Trie. For searching a Trie the program only needs to traverse the tree until the word is found and return the flag associated with the final node, however if the end of the word cant be reached, it doesn’t exist so search should naturally return false.

General Trie Structure

In order to achieve the functionality described above, most Trie nodes will at the very least have a structure like the following code block suggests.

struct TrieNode{
public:
//overloaded constructor for TrieNode
TrieNode(int keyInput);
//Used to retrieve the key for a child node
int getKey();
//Used to set the key for a child node
void setKey(int keyInput);
//used to insert a string into the trie
void insert(std::string input, int depth);
//used to check if a string exists in the trie
bool search(std::string input, int depth);
//used to remove a string from the trie
int remove(std::string input, int depth);
private:
//ascii value of the current character
int key;
//all child nodes
TrieNode* data[26];
//determines if the node represents a complete word
bool isWord = false;
};

Beyond this the structure of a Trie class is relatively similar to that of a Binary Tree and other structures of that nature.

class Trie{
public:
//used to insert a string into the trie
void insert(std::string input);

//used to determine if a string exists within the trie
bool search(std::string input);

//used to remove a string from a trie
void remove(std::string input);

private:
//points to the root node of the trie
TrieNode* root;
};

Even though there is nothing inherently wrong with this implementation functionality-wise, there is a glaring issue in the regards to the amount of memory each individual node uses. Instantiating an array of 26 node pointers constantly every time a node is created is a very inefficient way to use space in a program involving Tries, and can become costly as Tries grow in size. A workaround for this could be to use vectors instead of arrays to ensure there’s no excess memory allocation in Trie nodes. If this is done however, care needs to be taken to ensure the Trie remains operating at an O(L) time complexity. Iterating over a vector of an unknown size in order to check for nodes will certainly prove worse than simply checking the index of a predetermined array.

Small example

In the following code block there is a recreation of the Trie illustrations from the previous sections. The implementation can be found in the GitHub repo linked at the bottom of this page.

//creates a Trie    
JTrie myTrie;
//setting up strings to insert
std::string myStrings[3];
myStrings[0] = "hello";
myStrings[1] = "helps";
myStrings[2] = "hand";
//inserting strings
for(int i = 0; i < 3; i++){
myTrie.insert(myStrings[i]);
std::cout << myStrings[i] << " inserted." << std::endl;
}//confirming they exist
for(int i = 0; i < 3; i++){
if(myTrie.search(myStrings[i])){
std::cout << myStrings[i] << " exists." << std::endl;
} else {
std::cout << myStrings[i] << " doesn't exist." << std::endl;
}
}

//removing "helps"
myTrie.remove(myStrings[1]);
std::cout << myStrings[1] << " removed." << std::endl;
//confirming new trie
for(int i = 0; i < 3; i++){
if(myTrie.search(myStrings[i])){
std::cout << myStrings[i] << " exists." << std::endl;
} else {
std::cout << myStrings[i] << " doesn't exist." << std::endl;
}
}

When executed, the output it is as follows:

hello inserted.
helps inserted.
hand inserted.
hello exists.
helps exists.
hand exists.
helps removed.
hello exists.
helps doesn't exist.
hand exists.

Further reading

https://www.toptal.com/java/the-trie-a-neglected-data-structure

https://www.hackerearth.com/practice/data-structures/advanced-data-structures/trie-keyword-tree/tutorial/#:~:text=Tries%20are%20an%20extremely%20special,and%20thus%20the%20name%20Trie.&text=A%20Trie%20is%20a%20special,be%20visualized%20like%20a%20graph.

https://asha952.medium.com/real-life-applications-of-tries-96fc4c08e4b5#:~:text=Spell%20check%20is%20another%20common,spell%2Dcheck%20feature%20is%20enabled.

Works Cited

“Advantages of Trie Data Structure.” GeeksforGeeks, 22 Jan. 2021, www.geeksforgeeks.org/advantages-trie-data-structure/.

Bellini, Anna-Chiara. “The Trie Data Structure: A Neglected Gem.” Toptal Engineering Blog, Toptal, 31 Aug. 2013, www.toptal.com/java/the-trie-a-neglected-data-structure.

“Trie Data Structure: Interview Cake.” Interview Cake: Programming Interview Questions and Tips, www.interviewcake.com/concept/java/trie.

Example Repo

https://github.com/JosiahCastillo/JTrie

--

--