Daily Algorithms: #1 (Trie)

Sandesh Bhusal
AlgoPods
Published in
3 min readAug 24, 2020

Question Statement:

Implement an autocomplete system. That is, given a query string sand a set of all possible query strings, return all strings in the set that have s as a prefix.

For example, given the query string de and the set of strings [dog, deer, deal], return [deer, deal].

Hint: Try preprocessing the dictionary into a more efficient data structure to speed up queries.

This was an interview question asked by twitter. In the title of this blog post, I have mentioned a data structure called trie, which is a powerful way of storing and looking up strings in a dictionary. Let’s first analyze the given problem in layman terms:

Bruteforce approach:

To solve this problem, one can do the following:

1. Generate an array containing all words
2. For any given word to be checked, check the provided word against every word in the array.
3. For each word whose prefix matches for given word, place it into a list
4. Return the list.

A sample implementation look like the following code:

def get_suggestions(wordlist, prefix):
return [word for word in worlist if word[:len(prefix)] == prefix ]

I have used list comprehension to greatly reduce the LOC for the code.

As one can probably guess, this approach takes considerable amount of time, if there are many words in the dictionary, as each word has to be tested. Similarly, a further optimizations are required, as the prefix of the prefix might match with some prefix of the word, e.g. api matches with the prefix ap for the words appples and append but none of them are the required suggestions.

Some other problems with this approach are:

  1. Expensive storage required, as words which have similar prefixes are stored irrespective of common prefixes.
  2. Traversal in the list everytime a new letter appears will be computationally expensive.

Trie:

In computer science, a trie, also called digital tree or prefix tree, is a kind of search tree — an ordered tree data structureused to store a dynamic set or associative array where the keys are usually strings. (Wikipedia)

Tries are associated with autocomplete systems. The structure of a trie can be seen in the image below:

Structure of a trie (Credits: Wikimedia)

In the naiive bruteforce implementation, we would have to check all the words against each keystroke; to remedy the situation, we could maintain a list of candidate words, which will be checked on further keystrokes. The working principle of a trie is similar. For each letter, it will traverse below and reach another node; if the node is a leaf node, then a complete word has been found; else, a list of candidates can be generated by traversing downwards from the given node.

The trie denotes a compact representation of the dictionary, which is the second part asked in the question, as it consumes constant memory for common prefixes of words. For an instance, a single storage of te points to words tea ted and ten in the dictionary. Hence, the space requirements are greatly reduced.

Let’s implement an autocomplete system!

If you run the above python script, you can see two words in the output, namely, ‘ant’ and ‘annoyance’ as they start with ‘an’.

The code is pretty much self-explanatory. We have created a class called trie. The class implements 3 methods, and 1 dunder function. We can insert word, search for a word and retrieve a list of words that start with given prefix.

Happy coding!

--

--