How to write your own spellchecker and autocorrect algorithm in under 80 lines of code

Will Sentance
Dec 30, 2014 · 3 min read

Spellcheckers and autocorrect can feel like magic. Even advanced engineers confess to not having a fully intuitive grasp of what goes on underneath the hood. Yet they’re at the core of everyday applications — our phones, office software, google search (you type in ‘incorect’ and google returns ‘Did you mean incorrect).

So how does this magic happen? And how can we build our own?

First let’s consider what a spellchecker does. If it’s useful, it takes in a misspelled word — for example ‘teh’ — and returns the best guess at the correct spelling — ‘the’

There are two key components to this

First, a body of language as it’s commonly used — literature but also contemporary text with common current words (like iPhone). From this we can produce a word count

{the: 85039, it: 68904, a: 68598 ... }

Second, we can usefully conceptualize every error as one of four types. So if the user types ‘teh’ they could have meant to:

  • add a letter: ‘teh’ -> ‘ateh, bteh, cteh, dteh,….,tehy, tehz’
  • remove a letter: ‘teh’ -> ‘eh, te, th’
  • substitute a letter: ‘teh’ -> ‘aeh, beh, ceh,…,tey, tez’
  • switch two adjacent letters: ‘teh’ -> ‘hte, the’

All the above words are said to be of error distance 1. A word of error distance 2 from ‘teh’ would be ‘too’ — ‘teh’ -> ‘toh’ -> ‘too’


There are then three steps to our particular implementation of a spellchecker. First, check whether the word is in our dictionary (our wordcount dictionary). If it is not, generate all the possible words that are of error distance 1 and 2 *and* that are also real words in our dictionary.

Finally we must consider the word count of the possible words to decide which word is the most likely correction. Here things get especially interesting.

Suppose we have the word ‘thene’ that is error distance 1 from ‘then’ but error distance 2 from ‘the’. It would seem that even though the error distance 2 word (‘the’) is far more common, the correction is more likely to be ‘then’.

So we must decide on a ratio at which we should accept an error distance 2 word (I’ve found somewhere between 10 and 100x works well). It’s also useful into account word length — shorter words are less likely to be an error distance of 2 from the intended word.

You can find the full code (in javascript) here


Will Sentance is CEO at Codesmith. He teaches core software engineering, computer science, backend engineering with Node, and prepares students for hiring.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store