Suppose you would like to code a game of Scrabble, or Boggle, or some other word game which will require searching through a long list of words. When I say “long”, I mean “about 100,000 words” long. Suppose you would like to see if some word is in our long list of words; out dictionary. The naive approach would be to iterate every single word and get a match by the end of the iterations. This is a very poor solution since it runs in linear time, it would take a long time to see if a single word is in the dictionary! We can do better than that! Let’s also mention that while searching is linear time regardless if it’s in alphabetical order or not, we can just say it is not in alphabetical order and thus inserting a word in our dictionary is now in constant time.
Let’s propose a better idea. We’ll maintain our list in alphabetical order and rewrite the search method to use a sort of binary search. If done appropriately, the runtime to search for a word can be in logarithmic time! However, because we always need to make sure our list is in alphabetical order, we will also have to utilize binary search to insert new words to our dictionary. We save time on searching for words, but we also sacrifice time for inserting words. We can surely do better than that! This is where the trie comes in.
The intuition behind the trie comes from the idea of traversing a dictionary letter-by-letter so that the runtime is extremely fast for both searching and inserting, and indeed, we can materialize this idea by considering an n-ary dict tree! The idea is the following: a node in a tree is a dict with letters as its keys, and other nodes as its values. The root represents the first letter, and a child of the root represents the second letter, and so on. For example, if I were to insert “cat”, “car”, and “cup” in a trie, the root would only have the key “c”, and its child node would have keys “a” and “u”, and the child node of the “a” key would have “t” and “r”. We actually save a lot of time and redundancy in the long-run by using a trie!
How would you insert a word into a tree? Let’s start at the node and some word. We can actually do it recursively! Let’s pop the first letter of the word, if that letter is a key of the root, then traverse into that child node and recursive insert that word (minus that first letter) starting from that child node! However, if that letter is not a key of the root, then it’s no problem, we can just create a new empty node with that letter as its key! With a new empty node initialized, you can traverse into that node and do the same thing again! Just keep popping the first letter and traversing down the trie until your word is just an empty string!
How would you check if a trie contains some word? We can actually use some of the intuition of inserting a word in a trie and apply it here! We can keep popping the first letter and traversing the trie recursively until we hit one of the two cases. If you’ve traversed down a tree and a letter which you’ve popped is not a key of a node, then your journey ends there because it means the word you are searching for is not in your trie! If you do manage to pop all of the letters out of the word you are searching for, then it means you have found the word if and only if the node you end up in is empty! So far, so good! However, there is a slight issue which we need to promptly fix.
Imagine we add the word “sky” to our trie. Done? Now go ahead and insert “skyfall”. Do you see the issue yet? With our current methods of inserting and searching, those following actions would no longer distinguish “sky” as an inserted word, and based on our previous methods, searching for “sky” would return false! That’s not good, is there still hope? Thankfully yes, we just need to think outside the box for a moment!
Let’s agree that words don’t have spaces in them, life is better that way. In that case, we can add a special key for a trie node to signify the end of a valid word: the space. Modifying our insert method, after you pop all the letters out of your word and you are in a resulting node after traversing down a trie, the real final step would be to add a “ ” key to the node and its value can really be anything as long as it exists. Now we can modify our search method. The first case is completely fine, if you’ve hit a roadblock, then it’s really a roadblock. The second case is where is needs to be fixed, instead of concluding the word is in the trie if and only if the resulting node is empty, it’s sufficient to say that the word is actually in the trie if and if the resulting node has a “ ” key. This actually fixes our compound word issue!
Here is an example: if you insert “to”, “too”, and “two” into a trie, then the root would have just the key “t”, and its child node would have keys “o” and “w”, and the child node of key “o” would have the keys “o” and “ ”! The words “to” and “too” can now be distinguished in our trie!
As you start inserting more and more words into a trie, you start to notice that you’re saving a lot of time and space storing the words in a trie! If you restrict your letter count to 26, then you are guaranteed to have a prefix overlap (or even just one letter overlap) by the 27th word insertion by the pigeonhole principle! We can now officially discuss the runtime of the methods. The insert methods runs in linear time, but not in respect to the amount of words stored in the trie, but the amount of letters the word you’re inserting has, this is very small in the long-run. The search method has also the exact same runtime as the insert method! It’s not too hard to see that you get the most mileage out of a tree when you store a lot of words!
In conclusion, if you want to create a word game and need to create a data type that stores thousands of words and very quickly insert and search for words, consider a trie. It will save you a lot of headaches in the future! That’s it for this week, see you next week!