Here’s what we will be doing:
- Create a simple web page with an input box where the user can enter text.
- Add some CSS to style the web page.
Setting up the stage
Here is the HTML code for the web page we will be working with:
Finally, we add a div tag to show what the user has typed.
Next, let’s style it up:
Here’s what the web page looks like:
So what does all the code above do?
Now let’s make our page a bit smarter.
Currently, the submit() function just copies what the user types into the output div. We want it to also check whether the user has made a spelling error.
Since we don’t want to attempt to correct the user every time they press a key, we have to wait until they have pressed the space bar, indicating that they have completed the word, before checking for typos.
We have added a new function called autoCorrect() which takes in a word, logs it to the console and returns the word. We have also added some code to the submit() function that checks whether the last character in the user’s text is a white space, if it is, then the last word in the text is passed to the autoCorrect() function and the result is used to replace the last word in the text.
Later on, we will see that the autoCorrect() function will return the correct spelling of the last word, and by correcting the spelling of the words one at a time, the user ends up having a correctly spelled sentence.
Up next, we want to make the autoCorrect() function check the spelling of the word passed to it, and if it is incorrect, return the correct spelling.
First, we would define an array of all the words that are expected. This would depend on what the user is expected to type. For instance, if the user is expected to enter an English sentence, the array might contain all the words in the English language.
In our case, to keep things simple, we would ask the user to enter some popular tech companies (Google, Apple, etc.).
Let’s update the placeholder for the input box and the output:
Now, let’s create our array of expected words:
Our universe of discourse (or array of expected words)
Next, we will be working with bigrams. The bigram of a word is a collection of pairs of adjacent characters in the word. For example, the bigram of “bigram” is [“bi”, “ig”, “gr”, “ra”, “am”]. Read more about bigrams here: https://en.wikipedia.org/wiki/Bigram.
We will compare the bigram of the last word entered by the user with the bigram of each word in our expected-words array and see which one has the highest similarity, then replace the user’s word with the closest matching word in the array.
We have added two new functions:
- getBigram(), which returns the bigram of a word as an array;
- getSimilarity(), which returns the bigram similarity of two words. It does this by using the getBigram() function to get the bigram of each word, then it counts the number of pairs of adjacent characters in the first bigram that are also in the second bigram, and returns the ratio of this count to the length of the longer bigram array.
For example, if the two words are “hello” and “helo”, the steps to be taken are:
- Get the bigrams: [“he”, “el”, “ll”, “lo”] and [“he”, “el”, “lo”].
- Count how many pairs of adjacent characters in the first bigram are also in the second. Since “he”, “el” and “lo” are in both bigrams, we have three similar pairs.
- Finally, find the ratio of this count to the length of the longer bigram. This gives 3/4 or 0.75 as the bigram similarity of the two words.
Now, let’s add this to the autoCorrect() function:
Our code can now check words the user has written and replace them with the closest matching word from the expected-words array.
There is a problem though. If the user wants to type in a tech company which is not among the ones we listed but is a bit similar in spelling to one of them, our tool replaces the user’s company with that similar one. For instance, “Alibaba” becomes “Alphabet”.
To fix that, we can set a threshold value below which no correction should be made:
With the new code for the autoCorrect() function, if the bigram similarity of the word the user entered and the matched word from our array is not more than 0.5, the word will not be corrected. The threshold value depends on you and how closely you want the user’s input to match the correct spelling.
And with that, we have concluded our work. Below are a few shots to see the tool in action:
With the latter example, the spelling the user entered is too far off from the correct spelling of “Facebook” (0.29), so the threshold value we have set disallows auto-correction.
We have built a very simple auto-correction tool which can be used in different application areas, like in product searches in an e-commerce website, spelling check in a text editor and so on. The tool works fine, but can be improved as required.