How does it work the predictive text in smart phones?

Saidel López
6 min readSep 23, 2019

There are small things in our daily basis that might be unnoticed until someone else ask “how does it work?” Well, in this post let me share with you a very used algorithm in smart phones to predict your words while you’re texting.

The context

In your smart phone, when you’re texting in any app or Website, you may notice that certain list of words appears to fulfill your sentences. Sometimes the words are completely unexpected but some other times it actually matches with our expectation. If it is used wisely, it may help to text faster without typos. In the other hand, some peoples disable this function since they found it annoying.

What the problem is?

While the user is texting, as developers, we want to assist him/her by showing a list of possible words that might match with the idea based in the context the original text is leading to. So in order to show the best list of words, we take two or three characters as the input and we look into a dictionary which one are the best possible words (sometimes two or three words are enough)

Some leads

I mentioned a couple of basic concepts we can use to move from there. First, I mentioned the input are two or three characters provided by the user. The second is the concept of dictionary, which is, a set of all possible words that might exist for a given language. For this algorithm, we should define “how close” a word in the dictionary is pretty much the word the user is looking for. There are some metrics already defined by information theory like Levenshtein distance that counts the number of different characters existing between two words. We may take the three words where the Levenshtein distance in lower than others.

Data Structure to consider: Patricia Trie

AKA Radix Tree, the Patricia trie is a structure to store pieces of words (not syllables) that the set of words may have in common. For instance, for the following words: “war, warning, warhog, warm, warming”, this is the Patricia Trie

Patricia Trie for a small set of words

As you can see, the construction of this Trie has a way to construct all the words in the list and keeps its alphabetical order. The way to reconstruct the list of words is running from the root to a leaf adding each piece of words.

If searching for an inexistent word, The Patricia Trie will show the closest word that appears as a leaf or a node, for instance, in the example above, if we look for the word warn, although it is not appearing on the Trie, we will find that warm, warming and warning are the closest word to it.

Finally, this structure saves a lot of space in memory since you don’t need to store each one of the words of the dictionary in memory. You’re only storing pieces of words and languages such as English and Spanish have a lot of words that shares same characters.

The algorithm

We do know now that Patricia Tries provides a great structure to keep a set of words and we can know if a word is “closest” to another using the Levenshtein distance as heuristics. Before running the algorithm, we need this structure up and functional:

The Dictionary using all possible words of a given language and store this information in a Patricia Trie. (Most of the dictionaries in English includes a set of 25,000 words in it) so it can be living in memory without major restrictions on any system.

So, here’s the algorithm.

  1. Create a search word using the first three characters that the user inputs.
  2. Search into the Dictionary for this search word, if there’s a word that matches with one in the dictionary, then return. Do not show anything to the user.
  3. If there’s are no other siblings nodes, then return the list of words in the Dictionary in the same node until it the first leaf.
  4. If there are no specific match in the dictionary, get the parent node which should bring a narrowed list of words that are very close to the word we are looking for.
  5. Take the sibling nodes in this levels ad measure the Levenshtein distance to the search word. Once you have three words with minimum distance, return those words.
  6. If the user sets a new character to the search word, start over from 1.

Examples:

Taking the same mini construction above for the small list of words. Let’s try to search for some words and see how does it

  1. Searching for the word “warning”. This case is very simple. Since the word actually exists and is a leaf, the it returns nothing. In the following examples, I’ll try to elaborate on what happens for each character the user adds to the searches. As you may think, in this example I jumped some middle steps to get to the final result.
  2. Searching for the word “war”
  • war is the root node (with more child nodes) and also matches with a word in the dictionary. In this case, the algorithm should return the word “war” but also the closest child nodes. In this case “warhog” and “warlock”

3. Searching for the word “warp”

  • Let’s go sequentially. The algorithm starts once the user writes the first three characters. When the user writes “war” it will bring the same words as the example above “war”, “warhog” and “warlock”. Once the user sets the final p, the search word changes to “warp” and then the proposed words are “warlock”, “warm” and “warming”.

4. Search for the word “politics”

  • After the first three characters “pol” it will bring “war”, “warhog” and “warlock”. After the next character it will calculate the same set of words. In fact, since the whole word is not appearing in the dictionary, it will bring the same list if proposed words over and over again.

Why does it work?

Patricia Tries might hold all possible words which makes it incredible long and wide trees in memory. The trick here is that the words we are adding to the tree are limited to a given language. For instance, in English, there are around 25,000 words but an average person only uses 600 words plus verb conjugations in a daily basis. Incredible similar numbers are used in Spanish although the number of possible words is bigger than 25,000. Some smartphones have a Patricia Trie per language and based on the first words of the user’s text, it tries to use one Trie more than the other.

Bottom line is that this technique works better in romance-latin languages (such as English, Spanish, Italian, French, German, etc) but it actually fails most of the time with syllabic languages like Japanese or languages without vowels like Hebrew or limited characters of alphabet like Hawaiian. Some dialects are incredible difficult to convert to written words

Certain improvements

  • Some words predictors besides a simple dictionary, it has at least two. One to calculate the words starting from the beginning and other starting backwards. This may help to find words even if the user writes typos.
  • Other words predictors learn from the user. In the example above, of the user writes the word “politics” and it doesn’t appear in the dictionary, it will be added to a “personal version” of dictionary for the user. Next time, the user writes “pol” surely it will bring the word “politics”
  • Another improvement is to have a specific Patricia Trie dedicated to verbs and conjugations, other for sustantives, another one for adjetives, etc. Based on the context of a sentence, this systems can predict if the following word is a verb, then search in the verb dictionary; if the word is an adjetive, search in the adjetive dictionary and so on.
  • Finally, cleaning and sanitizing the dictionary to keep the Trie as small as possible to improve how fast it searches. This can be done by defining families of words to narrow down the number of words stored in the dictionary.

--

--