This is part of the series Learning System Design
Credits for original implementation go to Pedro Lopes
What is an Autocomplete System?
It is a feature that phones, or browsers have. When an user starts typing a sentence, he is given a short list of recommended phrases. While this feature looks simple or not interesting, to make it scalable to millions of users with minimal latency is a real challenge.
After looking over the source code of the project, I told myself that it is a lot to digest. I just wasn’t able to follow all the connections. Some of the links from the Gateway to Collector Backend and to Assembler Frontend I was capable of following, but I was lost in understanding how did the phrases ended into the Trie Builder or how some shell scripts were interconnected.
However, I was still intrigued about how somebody can code and glue together such a system. Without planning too much I decided to do the following steps.
One important thing to mention is that I didn’t wanted to reproduce exactly the original implementation, nor to make it better. As a consequence, some of choices in the steps below may look superficial, but this is only because I wanted to learn how to glue every component.
Step 1. Remove everything and start from scratch.
I decided to start simple. What can be removed from the original diagram?
Well, almost everything can be removed. So this is the result and the code is on GitHub.
The system is now a simple monolith. It is inefficient and unreliable.
Every time a top-phrases is requested, the app reads the trie from disk and computes the results for a prefix.
Every time a search is requested, the app constructs a new trie and saves it to disk.
Step 2. Break the monolith into smaller pieces
What can be done better? Instead of having a monolith, the system can be composed of 3 services as in the next diagram. Here is the code for this version.
Nothing better is done in terms of efficiency. Data is still saved on the local file system. Each service shares a local volume. With every /search request, the app still constructs a new trie and with every /top-phrases request, the app still reads the trie from disk and computes the result for a prefix.
The services are now grouped using Docker, which will make it easier to add new components.
Step 3. Add a Trie Builder service
In the versions above, one very expensive and redundant operation is in Assembler Collector when it constructs a new trie with every /search request. Instead of doing this, a Trie Builder component can be introduced that can build the trie at fixed intervals.
The component can be a service that can be called or a simple script that verifies the file system for new phrases at regular intervals. I decided to use the make it as a service, because it follows the nice story towards building the original implementation.
With this architecture the problem of building the trie every time a new phrase is submitted is solved.
The Collector Backend is still responsible for deciding whether a new trie should be constructed. It does that by listing the file system and verifying the current phrase’s timestamp and the timestamp of the last file. Each file contains the phrases for a 30 minute (or second) intervals. When the Collector Backend detects the current phrase belongs to a new sliding window, it sends a signal to the Trie Builder to build the trie given the available files.
One major problem that remains in this implementation is that the the Distributor Backend still loads the trie with every request.
Here is the code for the new version.
Step 4. Quick solution for Distributor Backend
So far, the system was saving each phrase into timestamp named files. Every new trie was saved by overwriting the previous one. Instead of overwriting the previous trie file, every new trie contains the name of the last file, just for the purpose of distinguishing them.
Now, the Distributor Backend can list the available trie files and load the trie from disk only if a new trie is available. Here is the code for this version.
Step 5. Add signaling between Trie Builder and Distributor
Instead of listing the available trie files, the Distributor Backend can reload the trie file given a signal from the Trie Builder once it has finished and saved the new trie. Here is the code.
This is fine, but one problem that can arise from this way of communicating is that resources and events are not separated. This can lead to mixing responsibilities and can slow down development.
Step 6. Add Zookeeper
To solve the communication problem described above, the Zookeeper was introduced into the system. Now that Zookeeper is introduced, Collector Backend no longer makes any call to the Trie Builder, it simply dumps the phrases into files to the file system.
To trigger the Trie Builder, a new Tasks component has been added. It is implemented as a simple script that selects a set of phrase files from the disk and creates a so called “target” that the Trie Builder should build. The Tasks component notifies the Trie Builder via the Zookeeper.
Once the Trie Builder finishes building, it notifies the Backend via the Zookeeper to load a particular trie file.
To setup properly the Zookeeper the target “setup” in Makefile must be executed and the target “do_tasks” must be executed to trigger the Trie Builder.
Code is here
Step 7. Add HDFS and Message Broker
Until now, the data was saved and loaded from the local disk by using shared volumes between docker containers. A better way to do this is by introducing HDFS as storage. In the same time the message broker is introduced. The message broker received phrases from collector and sends them to Kafka Connect which in turn saves them to HDFS.
After the HDFS was set up the following snippets represent the information flow from the moment it is received in Collector until it is dumped on HDFS
Instead of interacting with the file system the do_tasks.sh now interacts with HDFS. After the do_tasks.sh has finished creating a target, the Trie Builder is triggered. It will read the files for the specified target from HDFS
It took me about 3 days to understand all of these. I will probably stop here for now. The missing components from the diagram compared to the original implementation consist of Load Balancers, Caches, Partitioning and Map Reduce. To me, these were less important as I was more interested into the core architecture. Maybe I will allocate some time later to look over the Partitioning and Map Reduce parts.
Part two, where I add partitioning, load balancers and cache is now available.