LanguageTool: Grammar and Spell Checker in Python

A gentle introduction of how you can check your Grammar and Spelling in Python

George Pipis
Sep 14, 2020 · 6 min read
Image for post
Image for post
Screenshot from https://languagetool.org/

LanguageTool

LanguageTool is an open-source grammar tool, also known as the spellchecker for OpenOffice. This library allows you to detect grammar errors and spelling mistakes through a Python script or through a command-line interface. We will work with the language_tool_pyton python package which can be installed with the pip install language-tool-python command. By default, language_tool_python will download a LanguageTool server .jar and run that in the background to detect grammar errors locally. However, LanguageTool also offers a Public HTTP Proofreading API that is supported as well but there is a restriction in the number of calls.

LanguageTool in Python

We will provide a practical example of how you can detect your grammar mistakes and also correct them. We will work with the following text:

LanguageTool offers spell and grammar checking. Just paste your text here and click the ‘Check Text’ button. Click the colored phrases for details on potential errors. or use this text too see an few of of the problems that LanguageTool can detecd. What do you thinks of grammar checkers? Please not that they are not perfect. Style issues get a blue marker: It’s 5 P.M. in the afternoon. The weather was nice on Thursday, 27 June 2017 “.

I made bold the grammar issues. Let’s see how we can detect them with Python:

import language_tool_pythontool = language_tool_python.LanguageTool('en-US')text = """LanguageTool offers spell and grammar checking. Just paste your text here and click the 'Check Text' button. Click the colored phrases for details on potential errors. or use this text too see an few of of the problems that LanguageTool can detecd. What do you thinks of grammar checkers? Please not that they are not perfect. Style issues get a blue marker: It's 5 P.M. in the afternoon. The weather was nice on Thursday, 27 June 2017"""# get the matchesmatches = tool.check(text)matches

And we get:

[Match({'ruleId': 'UPPERCASE_SENTENCE_START', 'message': 'This sentence does not start with an uppercase letter', 'replacements': ['Or'], 'context': '...hrases for details on potential errors. or use this text too see an few of of the ...', 'offset': 168, 'errorLength': 2, 'category': 'CASING', 'ruleIssueType': 'typographical'}), Match({'ruleId': 'TOO_TO', 'message': 'Did you mean "to see"?', 'replacements': ['to see'], 'context': '...s on potential errors. or use this text too see an few of of the problems that Language...', 'offset': 185, 'errorLength': 7, 'category': 'CONFUSED_WORDS', 'ruleIssueType': 'misspelling'}), Match({'ruleId': 'EN_A_VS_AN', 'message': 'Use "a" instead of \'an\' if the following word doesn\'t start with a vowel sound, e.g. \'a sentence\', \'a university\'', 'replacements': ['a'], 'context': '...ential errors. or use this text too see an few of of the problems that LanguageToo...', 'offset': 193, 'errorLength': 2, 'category': 'MISC', 'ruleIssueType': 'misspelling'}), Match({'ruleId': 'ENGLISH_WORD_REPEAT_RULE', 'message': 'Possible typo: you repeated a word', 'replacements': ['of'], 'context': '...errors. or use this text too see an few of of the problems that LanguageTool can dete...', 'offset': 200, 'errorLength': 5, 'category': 'MISC', 'ruleIssueType': 'duplication'}), Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'message': 'Possible spelling mistake found.', 'replacements': ['detect'], 'context': '...f of the problems that LanguageTool can detecd. What do you thinks of grammar checkers...', 'offset': 241, 'errorLength': 6, 'category': 'TYPOS', 'ruleIssueType': 'misspelling'}), Match({'ruleId': 'DO_VBZ', 'message': 'After the auxiliary verb \'do\', use the base form of the main verb. Did you mean "think"?', 'replacements': ['think'], 'context': '...at LanguageTool can detecd. What do you thinks of grammar checkers? Please not that th...', 'offset': 261, 'errorLength': 6, 'category': 'GRAMMAR', 'ruleIssueType': 'grammar'}), Match({'ruleId': 'PLEASE_NOT_THAT', 'message': 'Did you mean "note"?', 'replacements': ['note'], 'context': '... you thinks of grammar checkers? Please not that they are not perfect. Style issues...', 'offset': 296, 'errorLength': 3, 'category': 'TYPOS', 'ruleIssueType': 'misspelling'}), Match({'ruleId': 'PM_IN_THE_EVENING', 'message': 'This is redundant. Consider using "P.M."', 'replacements': ['P.M.'], 'context': "... Style issues get a blue marker: It's 5 P.M. in the afternoon. The weather was nice on Thursday, 27 Ju...", 'offset': 366, 'errorLength': 22, 'category': 'REDUNDANCY', 'ruleIssueType': 'style'}), Match({'ruleId': 'DATE_WEEKDAY', 'message': 'The date 27 June 2017 is not a Thursday, but a Tuesday.', 'replacements': [], 'context': '... the afternoon. The weather was nice on Thursday, 27 June 2017', 'offset': 413, 'errorLength': 22, 'category': 'SEMANTICS', 'ruleIssueType': 'inconsistency'})]

As we can see we get a detailed dictionary that shows the ruleId, the message etc. You can find a detailed explanation of every rule id in the LanguageTool Community. It is interesting to see the error that it captured about the date where it returns a message that: The date 27 June 2017 is not a Thursday, but a Tuesday. However, for this case, it does not have a correction because it cannot guess what did the author mean by entering this date 🙂

Since we detect the mistakes now we can correct them.

my_mistakes = []
my_corrections = []
start_positions = []
end_positions = []
for rules in matches:
if len(rules.replacements)>0:
start_positions.append(rules.offset)
end_positions.append(rules.errorLength+rules.offset)
my_mistakes.append(
text[rules.offset:rules.errorLength+rules.offset])
my_corrections.append(rules.replacements[0])
my_new_text = list(text)for m in range(len(start_positions)):
for i in range(len(text)):
my_new_text[start_positions[m]] = my_corrections[m]
if (i>start_positions[m] and i<end_positions[m]):
my_new_text[i]=""
my_new_text = "".join(my_new_text)my_new_text

And we get (In bold you can see the corrections):

LanguageTool offers spell and grammar checking. Just paste your text here and click the ‘Check Text’ button. Click the colored phrases for details on potential errors. Or use this text to see a few of the problems that LanguageTool can detect. What do you think of grammar checkers? Please note that they are not perfect. Style issues get a blue marker: It’s 5 P.M. The weather was nice on Thursday, 27 June 2017

Spelling and Grammar Mistakes

Let’s see the mistakes that we captured and their corresponding corrections.

list(zip(my_mistakes,my_corrections))[('or', 'Or'),
('too see', 'to see'),
('an', 'a'),
('of of', 'of'),
('detecd', 'detect'),
('thinks', 'think'),
('not', 'note'),
('P.M. in the afternoon.', 'P.M.')]

Detailed Example

We will provide a detailed example by taking into consideration a simple example of just one sentence and we will have a look at the output we get from the LanguageTool. Our sentence:

Your the best but their are allso good!

text = "Your the best but their are allso  good !"
matches = tool.check(text)
len(matches)
# 4

The LanguageTool detected 4 issues. We can focus on each issue. Let’s have a look at the first one.

matches[0]

And we get:

Match({'ruleId': 'YOUR_YOU_RE', 'message': 'Did you mean "You\'re"?', 'replacements': ["You're"], 'context': 'Your the best but their are allso  good !', 'offset': 0, 'errorLength': 4, 'category': 'TYPOS', 'ruleIssueType': 'misspelling'})

As we can see it mentions the ruleId, a message to the end your which is “Did you mean “You’re“, the suggested replacements, the context which is the input, the offset which is the position of the start of the issue, the errorLength which is the number of characters of the issue, in our case 4 characters, the category of the mistake which is “TYPOS” in our case and the releIssueType which is “misspelling”.

We can show how we can call each element of the language_tool_python.match.Match type with the name followed by a period. Let’s say that we want to call the replacements.

matches[0].replacements# ["You're"]

Let’s have a look at the other issues detected by LanguageTool. The second detected issue was the “their” which is corrected to there

matches[1]

And we get:

Match({'ruleId': 'THEIR_IS', 'message': 'Did you mean "there"?', 'replacements': ['there'], 'context': 'Your the best but their are allso  good !', 'offset': 18, 'errorLength': 5, 'category': 'CONFUSED_WORDS', 'ruleIssueType': 'misspelling'})

The third detected issue was the “allso” which is corrected to also

matches[2]

And we get:

Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'message': 'Possible spelling mistake found.', 'replacements': ['also', 'all so'], 'context': 'Your the best but their are allso  good !', 'offset': 28, 'errorLength': 5, 'category': 'TYPOS', 'ruleIssueType': 'misspelling'})

Finally, the last detected issue was the double spaces which is corrected to a single space.

matches[3]

And we get:

Match({'ruleId': 'WHITESPACE_RULE', 'message': 'Possible typo: you repeated a whitespace', 'replacements': [' '], 'context': 'Your the best but their are allso  good!', 'offset': 33, 'errorLength': 2, 'category': 'TYPOGRAPHY', 'ruleIssueType': 'whitespace'})

Discussion

If we want a free tool in Python that does similar work with Grammarly and supports more than 20 languages, then the LanguageTool is a good option. Of course, none tool is perfect and we cannot rely solely on grammar and spell checkers but for sure is something that we can use mainly in NLP tasks and projects.

Originally published at https://predictivehacks.com.

The Startup

Medium's largest active publication, followed by +773K people. Follow to join our community.

George Pipis

Written by

Data Scientist @ Persado | Co-founder of the Data Science blog: https://predictivehacks.com/

The Startup

Medium's largest active publication, followed by +773K people. Follow to join our community.

George Pipis

Written by

Data Scientist @ Persado | Co-founder of the Data Science blog: https://predictivehacks.com/

The Startup

Medium's largest active publication, followed by +773K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store