How to build a simple auto-correction tool

Adeyemo Simisoluwa
Jan 14 · 7 min read

This particular one describes how to build a simple tool that will help to automatically replace a user’s wrong spelling of a word with the correct one. Although we will be working with JavaScript here, you can build a tool like this in other languages.

Here’s what we will be doing:

  • Create a simple web page with an input box where the user can enter text.
  • Add some CSS to style the web page.
  • Add some JavaScript to check whatever the user types for a spelling error and to correct the error in real-time.

I will assume that you already have a little experience with HTML, CSS and JavaScript.


Setting up the stage

Here is the HTML code for the web page we will be working with:

HTML code for the web page

We start by linking our CSS (main.css) and JS (main.js) files. Of course, you can name them whatever you like. Then, we add an input text-box to the page where the user will enter some text. Notice that its id is called “text” and that it has an onkeyup event handler, so whenever a key is pressed and released, the browser calls the submit() JavaScript function (see below).

Finally, we add a div tag to show what the user has typed.

Next, let’s style it up:

main.css-CSS code for the web page

and now some preliminary JavaScript:

main.js-Initial Javascript code

Here’s what the web page looks like:

The web page at first glance
The web page after typing

So what does all the code above do?

The user is shown a text box where they can enter text, and if they do, the submit() JavaScript function is called. The function copies what the user has typed into the output div tag, and clears the div tag if the text box is empty.

Now let’s make our page a bit smarter.


Auto-correction

Currently, the submit() function just copies what the user types into the output div. We want it to also check whether the user has made a spelling error.

Since we don’t want to attempt to correct the user every time they press a key, we have to wait until they have pressed the space bar, indicating that they have completed the word, before checking for typos.

First, let’s add a JavaScript function that is called only when the user has completed a word:

Updated form of main.js — added the autoCorrect() function

We have added a new function called autoCorrect() which takes in a word, logs it to the console and returns the word. We have also added some code to the submit() function that checks whether the last character in the user’s text is a white space, if it is, then the last word in the text is passed to the autoCorrect() function and the result is used to replace the last word in the text.

Later on, we will see that the autoCorrect() function will return the correct spelling of the last word, and by correcting the spelling of the words one at a time, the user ends up having a correctly spelled sentence.

Up next, we want to make the autoCorrect() function check the spelling of the word passed to it, and if it is incorrect, return the correct spelling.

First, we would define an array of all the words that are expected. This would depend on what the user is expected to type. For instance, if the user is expected to enter an English sentence, the array might contain all the words in the English language.

In our case, to keep things simple, we would ask the user to enter some popular tech companies (Google, Apple, etc.).

Let’s update the placeholder for the input box and the output:

The new placeholder
The new output

Now, let’s create our array of expected words:

Our universe of discourse (or array of expected words)

Next, we will be working with bigrams. The bigram of a word is a collection of pairs of adjacent characters in the word. For example, the bigram of “bigram” is [“bi”, “ig”, “gr”, “ra”, “am”]. Read more about bigrams here: https://en.wikipedia.org/wiki/Bigram.

We will compare the bigram of the last word entered by the user with the bigram of each word in our expected-words array and see which one has the highest similarity, then replace the user’s word with the closest matching word in the array.

Getting the bigram similarity of two words

We have added two new functions:

  • getBigram(), which returns the bigram of a word as an array;
  • getSimilarity(), which returns the bigram similarity of two words. It does this by using the getBigram() function to get the bigram of each word, then it counts the number of pairs of adjacent characters in the first bigram that are also in the second bigram, and returns the ratio of this count to the length of the longer bigram array.

For example, if the two words are “hello” and “helo”, the steps to be taken are:

  • Get the bigrams: [“he”, “el”, “ll”, “lo”] and [“he”, “el”, “lo”].
  • Count how many pairs of adjacent characters in the first bigram are also in the second. Since “he”, “el” and “lo” are in both bigrams, we have three similar pairs.
  • Finally, find the ratio of this count to the length of the longer bigram. This gives 3/4 or 0.75 as the bigram similarity of the two words.

Now, let’s add this to the autoCorrect() function:

Modified autocorrect() function

Our code can now check words the user has written and replace them with the closest matching word from the expected-words array.

There is a problem though. If the user wants to type in a tech company which is not among the ones we listed but is a bit similar in spelling to one of them, our tool replaces the user’s company with that similar one. For instance, “Alibaba” becomes “Alphabet”.

To fix that, we can set a threshold value below which no correction should be made:

Adding a similarity threshold

With the new code for the autoCorrect() function, if the bigram similarity of the word the user entered and the matched word from our array is not more than 0.5, the word will not be corrected. The threshold value depends on you and how closely you want the user’s input to match the correct spelling.

And with that, we have concluded our work. Below are a few shots to see the tool in action:

Testing the tool with a wrong spelling of “Alphabet”
Auto-correction result
Testing with a wrong spelling of “Facebook”

With the latter example, the spelling the user entered is too far off from the correct spelling of “Facebook” (0.29), so the threshold value we have set disallows auto-correction.


Wrapping up

We have built a very simple auto-correction tool which can be used in different application areas, like in product searches in an e-commerce website, spelling check in a text editor and so on. The tool works fine, but can be improved as required.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade