A word cloud centered on the word “corpora”.

Just what is a corpus and how can it help me?

Vojtech Janda
Engramo English Blog
2 min readMay 18, 2020

--

Computers are everywhere these days. They assist us in almost every aspect of our lives — including language stuff like translations. But there is much more you can do with language using a computer than just translate! The discipline called Corpus Linguistics started developing long before computers were really a thing, but moved to the digital world quite early on, in the 1960s, actually. A corpus (plural: corpuses or corpora) is a large collection of texts originating in real-world use, be it books, news articles and websites, or recordings of speech. Using the computational power of modern machines and some very complex human-written algorithms, it is possible to search and scour these collections for whatever you want.

Researchers have been using corpora to study properties of natural languages for decades, likewise, grammarians have been using them to learn the rules of actual language use (as opposed to the “you shouldn’t say this because I said so” approach of many teachers and textbooks, especially of the past). Lexicographers have also been using corpora to compile dictionaries, translators to look up obscure idioms and now, learners can exploit these tools as well. Corpora can be used as translation tools, as dictionaries, as a reference for when you’re not sure whether you’re using a piece of grammar correctly, etc.

Take a look at SkeLL for example: It’s a tool that allows you to browse examples of the use of a word or combination of words as well as look up the syntactic functions it serves, the words it tends to co-occur with and also find similar words, all through a simple interface.

Then there’s the InterCorp project, which, among other things, allows you to compare a piece of text in several languages. As of now, there’s 39 languages to choose from! Or you can look up translation equivalents for words and phrases via the TreQ interface, although there are restrictions on language combinations.

In short, corpora are tools to explore natural, real-world language use and learn from it whether you’re a scholar or a learner. And as we in Engramo know how useful real-life examples are, we made sure to use corpora as sources for our exercises, meaning that with us you won’t be learning from some hand-picked or artificially made-up sentences.

Try it and see for yourself!

--

--

Vojtech Janda
Engramo English Blog

Linguist specializing in usage-based, corpus linguistics & sociolinguistics, English-Czech translator, hobby programmer