Trie: The Secret to how Google Can Predict What You are Going to Search.

Jaival Patel
Nerd For Tech
Published in
6 min readJun 20, 2021

As programmers, we are constantly faced with data structures; rather it would be complex, linear, or even both. Data structures are important because they help organize data very efficiently, which can be used very effectively, for example, using algorithms to automate certain tasks. The most popular data structures we use today are Linked Lists, Stacks, Queues, HashMaps, Trees, Heaps, and Graphs, which form most of the primary and common algorithms we also use very frequently in our programs.

The cool thing about data structures is that they don’t have to be used the way they were originally taught. For example, if we learned that Trees can only have integer values as a node value, we don’t necessarily have to have integer values in our nodes in our tree, we can mix it up and add dictionaries, lists, linked lists, and etc. That’s what makes programming so dynamic and abstract; you can make anything and change anything you want!

Over the past couple of days, I’ve been researching about possible projects, and I came across a really interesting data structure called a trie. A trie is essentially a tree, but, instead of having nodes with values of actual strings or integer values, it only has a single letter. As you combine these nodes with single letters into a tree, there will be paths that create multiple words. From the basic definition I have just given, you can probably already tell that a trie is most often used to store words and characters, or more specifically, strings.

Trie Data Structure Diagram Example

Although tries are an uncommon data structure as not many computer science students or scientists come across this data structure as much as others, tries are very useful in many cases (including interviews!). Common applications of tries range from storing text or autocomplete a dictionary, implementing approximate matching algorithms (spell checking, hyphenation, and etc.), text searching, prefix matching, and much more! One of the best examples of a trie implementation would be the Google search engine.

We all know how Google’s search engine is one of the most genius things we, as humans have ever seen. When we type only one letter, it outputs approximately 10 search topics within less than a second! What’s even more mind boggling is how at times, we type two letters, and Google already outputs the topic or question we want to search. Tries make it very easy to search a subset of elements, as it’s very similar to binary trees. Each time we traverse down a branch of a tree, we are cutting out the number of extra nodes. That’s how Google’s search engine works! Although more complex, they use multiple tries, which will return certain or possible elements based on popularity and additional logic to determine the weight associated with certain terms in their trie structures. That’s why when you type two letters in the search engine, for example, you will already have 10 possible topics outputted for you from Google within less than a second.

In other words, since Google wants users to get the best and the most relevant information, they have their tries not just return possible search topics with the same letters as the prompt made from the user, but also, their tries return the most relevant information based on the number of users that have gone to the website and the output of their popular Pagerank algorithm (check out my article on PageRank if you want to learn more about how Google returns the best websites based on a search topic all the time). That’s the power of tries!

In terms of implementing the trie data structure yourself, there are many ways to do it. To make it much easier for myself, I made the data structure with two functions, .implementation() and .traverse(), which replicate the two primary functions of a trie; adding a node and traversing. Moreover, the trie in this example uses an array as an underlying data structure to reference the order of the trie with the order of the letters in a certain word (typically, most programmers would use key, value pairs with the addition of an array as their underlying data structure to reference their letters at constant time).

.implementation() function to add letters as nodes to the Trie

The first function I had made was .implementation() , which is used to add certain nodes to the trie. The function will take a list of words, and will traverse each word, letter by letter and add it to the trie if it meets one of the four cases.
1) If the letter is a new letter that is not in the array
2) If the letter is the same as the last letter in the array
3) If the letter is the first letter in the trie
*Note that the underlying array in our example is called
letters_in_tree .*

Notice how the three cases refer right back to the basic definition of the trie. If a letter does not meet any of these cases or does not meet case 2 and is the same letter as the current node value, the letter will not be added to the trie and will be accepted as the same letter as the current node, which will be skipped. This process keeps running until the program has completely traversed the whole list of words.

The next function is the .traverse() function, and by the name of it, you can probably already tell what that function is about. Yes, it’s a basic traversal algorithm that takes in a user-input, and will traverse the tree and keep track of the path it’s traversing until it reaches a leaf node. After it reaches a leaf node, it will compare the path’s nodes with the letters of the words that user-prompts, and it if matches completely, the program will return True else, False otherwise.

.traverse() function for Trie

To make sure there are no bugs by the way the user types a word out, I have made sure that the program has made all words lower case when both adding it to the trie and when comparing. This way, the program can compare and add new letters very smoothly. In terms of comparison, the way this algorithm compares nodes is very similar to linear search. If the current node’s value is equivalent to the letter we are currently on, we will add that to returned path, and we will move on to the next letter and node. If they don’t match, we don’t take any action, except to move on. However, if a parent node has two child nodes, we will compare the current letter with both children. If the first child is the same, we will take the left path, else, we will take the right path.

Furthermore, when it comes to the complexity of both functions, they both have a time complexity of O(an) since the functions of the trie depend on the length of the word a that we are adding or comparing (in our case), and the number of words in the list, n . Obviously, the shorter the list, the easier and faster the search becomes as it approaches constant time, and the bigger the list, the more words the algorithm has to go through, hence, increasing the time.

Here is the link to my Github if you would like to check my implementation of the Trie data structure: https://github.com/GEEGABYTE1/Trie

My version of the trie had only two functions, but you can add multiple functions, such as a function that deletes a certain letter, or word, sorts the words based on number of vowels, and much more. There are a lot of ways to play with tries!

Tries may not seem as useful in everyday problems as their purpose is very specific; retrieving elements. However, when working with strings, such as in a technical interview when they ask about searching for a certain word in a sentence or retrieving someone’s contact in a contact book, using a trie would be a great tool to use as they have the ability to search for elements in constant time compared to other algorithms such as Binary Search, Naive Pattern Search, and etc!

--

--

Nerd For Tech
Nerd For Tech

Published in Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit https://www.nerdfortech.org/.

Jaival Patel
Jaival Patel

Written by Jaival Patel

16y/o Computer Scientist x Mathematics Enthusiast. I love to share my research and interest in these two topics so you will see a lot of my blogs about my work!

No responses yet