Exploring the Power of Tries

Advanced Data Structures for Efficient Search

Ali Masri
Towards Data Engineering
5 min readJun 1, 2024

--

Photo by visuals on Unsplash

In the realm of data structures, the Trie stands out as a powerful tool for efficient search operations. This article delves into the intricacies of Tries, showcasing their advantages and applications, and providing practical examples to illustrate their utility.

The Need for Efficient Search

Imagine you have a vast dataset of words, and you need to find a specific word or all words that start with a particular prefix. A simple approach might involve splitting a sentence into words and checking if the target word exists:

sentence = "…"
words = sentence.split()
target_word = "memories"
if target_word in words:
print("Found")
else:
print("Not Found")

This method has a time complexity of O(n) and works efficiently for small datasets. However, as the dataset grows, the runtime increases significantly. For instance, searching through a file with 466,000 English words might take around 1.11 milliseconds.

The Challenge of Prefix Search

Finding all words that start with a specific prefix poses an even greater challenge. A naive approach might look like this:

sentence = "…"
words = sentence.split()
prefix = "mem"
results = [word for word in words if word.startswith(prefix)]

This method has a time complexity of O(n²), making it impractical for large datasets.

Enter the Trie

A Trie, also known as a prefix tree, is a tree-like data structure that stores a set of strings. Each node represents a single character, and the path from the root to a node represents a word. Here are the key features of a Trie:

  • The root node is empty.
  • Each node has a list of children.
  • Each node has a Boolean flag to indicate the end of a word.
Source: Open Source For Geeks: Understanding Tries data structure

Performance Analysis

Finding an element in a Trie is very efficient, with a time complexity of O(m), where m is the length of the word. This efficiency arises because you simply follow the tree, looking up each character in your search string. Each node traversal effectively narrows down the search space, eliminating large sections of the tree. As a result, Tries are significantly faster for search operations compared to basic string matching techniques. Finding all words with a specific prefix involves simply traversing the Trie, which is efficient even for large datasets.

Types of Tries

There are many different types of Tries. For example, compressed Tries optimize the standard Trie by merging nodes with a single child, reducing the overall space complexity. You can read more about types of Tries from the following link: Types of Tries — GeeksforGeeks

Applications of Tries

Tries have a wide range of applications, including:

  • Auto-Complete: Tries can quickly suggest possible completions for a partially typed word by efficiently searching through the tree.
  • IP Routing: Tries can efficiently manage and search routing tables, helping to determine the best path for data packets in a network.
  • Network Traffic Analysis: Tries can be used to analyze and categorize network traffic patterns by matching prefixes of IP addresses.
  • Genome Assembly: Tries can help in assembling DNA sequences by storing and searching for overlapping sequences efficiently.

For more information, check out https://www.geeksforgeeks.org/applications-advantages-and-disadvantages-of-trie

Implementing Tries in Python

The pygtrie library provides a simple and efficient way to work with Tries in Python.

import pygtrie
trie = pygtrie.CharTrie()
for word in words:
trie[word] = True
trie.has_key("memories") # checks if an element exists in a trie

Sample Application: Bash Completion

Bash completion enables you to auto-complete commands, file names, and more. You can implement your own auto-completion feature using Tries! To start, simply retrieve a list of your command history using the history command.

history > shell-history.txt

Then you can load all the history into a Trie and implement an interface for a prefix search. Here’s a simple implementation:

Here’s the Python code with added documentation:

import pygtrie

def load_history(file_path):
"""
Load command history from a file.

Args:
file_path (str): The path to the file containing the command history.
Returns:
list: A list of commands from the history file, with the command number removed.
"""
with open(file_path, "r") as f:
return [" ".join(command.strip().split()[1:]) for command in f.readlines()]

def build_trie(history):
"""
Build a Trie for auto-suggestion from the command history.

Args:
history (list): A list of commands from the history.
Returns:
pygtrie.CharTrie: A Trie where each command is a key and the value is the frequency of the command.
"""
trie = pygtrie.CharTrie()
for command in history:
command = command.strip()
# If a command already exists, increment the counter
# This enables us to identify the commands that are used more often
if command in trie:
trie[command] += 1
else:
trie[command] = 1
return trie

def suggest(text, trie, max_suggestions=5):
"""
Suggest commands based on the given text prefix.

Args:
text (str): The prefix text to search for suggestions.
trie (pygtrie.CharTrie): The Trie containing command history.
max_suggestions (int): The maximum number of suggestions to return.
Returns:
list: A list of suggested commands, sorted by frequency of use.
"""
try:
suggestions = trie.keys(prefix=text)
# Return the top suggestions that are used more often
return sorted(suggestions, key=lambda x: trie[x], reverse=True)[:max_suggestions]
except KeyError:
return []

def main():
"""
Main function to run the auto-suggestion feature.
"""
history = load_history('./data/shell-history.txt')
trie = build_trie(history)
while True:
text = input("> ")
suggestions = suggest(text, trie)
print("Suggestions:\n")
for suggestion in suggestions:
print(f"\t{suggestion} ({trie[suggestion]})")

if __name__ == "__main__":
main()

Productionizing Prefix Search

If you are interested in deploying prefix search in production I suggest reading this amazing blog post featuring Prefixy: https://medium.com/@prefixyteam/how-we-built-prefixy-a-scalable-prefix-search-service-for-powering-autocomplete-c20f98e2eff1

Thank you for exploring the power of Tries with me. I hope this article has provided you with valuable insights into this advanced data structure and its applications. Feel free to reach out with any questions or comments!

--

--