Deep Learning for Spell Checking

I use spell checking every day, it is built into word processors, browsers, smartphone keyboards. It helps to create better documents and make communication clear. More sophisticated spell checker can find grammatical and stylistic errors.

How to add spell checking to your application? A very simple implementation by Peter Norvig is just 22 lines of Python code.

In “Deep Spelling” article, Tal Weiss wrote that he tried to use this code and found that it is slow. The code is slow because it is brute forcing all possible combinations of edits on the original text.

An interesting approach is to use deep learning. Just create artificial dataset by adding spelling errors into correct English texts. And you better have lots of text! The author has been using one billion words dataset released by Google. Then train character-level sequence-to-sequence model with LSTM layers to convert a text with spelling errors to a correct text. Tal got very interesting results. Read the article for details.

Good quality spell checkers can be very useful for chatbots. Most of the chatbots rely on simple NLP techniques, and typical NLP pipeline includes syntax analysis and part of speech tagging, which can be easily broken if the input message is not grammatically correct or has spelling errors. Perhaps fixing spelling errors earlier in the NLP pipeline can improve the accuracy of natural language understanding.

It can be good to try train such spell checker model on another dataset, more conversational.

Have you tried to use any models like this in your apps?

The article was originally published on http://pavel.surmenok.com/2016/11/08/deep-learning-for-spell-checking/