The Future of Language Learning Technology

Humans generate a cornucopia of language content. Can machines help make it useful to language learners?

Mark Sanford
11 min readAug 18, 2020
Image by Jill Wellington from Pixabay

Making quality language learning content is hard. Duolingo, which is perhaps the most prolific creator of language learning content, says that just creating a quality curriculum with 2000 sentences can take a team up to nine months. And that’s just to cover half of what they consider to be a Common European Framework of Reference for Languages A1 level course, which essentially means “rank beginner”. (1) At Verbal Earth, when creating our flagship Mandarin Travel product, we found the development of content is considerably more challenging than the development of the software systems to deliver that content.

Why is creating language learning content so hard? A good course will need to teach the vocabulary and sentence structures a learner is most likely to need first. It will need to introduce vocabulary gradually, both to reuse previously introduced vocabulary to reinforce the learning of that vocabulary, and avoid “cognitive overload” in learners by introducing too much unfamiliar vocabulary at once. Course developers need to imagine scenarios in which a learner will be using the language, and invent sentences that give a learner the language micro-skills to use in those scenarios, all the while hoping they are not missing some key concepts.

“Good language classes will give the beginner comprehensible input that the outside world will supply only very reluctantly.” — Stephen Krashen

Software can assist with this challenging process. It can give feedback on how quickly a curriculum introduces new vocabulary, and how well it does at reinforcing previously introduced vocabulary under a spaced repetition inspired metric, or other criteria. At Verbal Earth, all of our lesson sequences were continually evaluated on many metrics during development, allowing them to be refined before being put into production. Duolingo uses tools to make sure not too much vocabulary is being introduced in a lesson, and that its used in a sufficient number of sentences. These tools can make for a better product, but it is still a very laborious process for content creators.

As the leader of the curriculum development effort for Mandarn Travel, a nagging question kept popping into my head: There is already so much language content being organically generated every day, so why don’t we use it for language learning? If a curriculum developer is creating content to help a learner develop language skills useful in a restaurant, why should she need to think up scenarios, enumerate key vocabulary and language patterns, and shape thousands of invented sentences into a coherent course, when there are real customers interacting with restaurant staff millions of times a day, “producing” content that is by definition ideal for a restaurant setting? And there are terabytes, if not petabytes of existing written, audio and video content on the web, covering every imaginable subject. Can this be harnessed in a way that makes it useful to language learners?

Machine curation and the cornucopia of content

In principle, there is sufficient content already existing on the internet today to provide all the content needed for a language learner of most major languages. The central challenge is that almost none of it is comprehensible input for language learners. If a machine is going to curate these terabytes of data into a curriculum that is useful to a language student, it will need to introduce the most important vocabulary and grammar first, and build gradually over time with sufficient repetition of previous vocabulary, much as human curriculum creators are currently doing.

One way of approaching machine curation of language content for use in language learning is as an optimization problem. In principle a machine could generate an “optimal” curriculum for a general audience, based on criteria such as a prioritized list of vocabulary to be covered, and a desired degree of spaced repetition and pace of introduction of new vocabulary. For a particular domain of language, such as train travel or hotel accommodations, a prioritized list of vocabulary is in principle readily attainable simply by processing the transcripts of real conversations in those contexts, and noting the frequencies of word usage.

Instead of software providing feedback to human content creators and the creators inventing new sentences, as is done currently, the software can instead choose an appropriate sequence of sentences from its vast library of content. A machine curator might look at the billions of sentences available to it, and choose “I am” as the first sentence in the curriculum because it uses the highest priority vocabulary. It might choose “I am fine” as a follow on, since it introduces a single new word, while giving practice with previous vocabulary. Candidate sentences are initially very few, but as a hundred or more core vocabulary are introduced, the options for the curator open up in an exponential fashion.

Building a curriculum then becomes a task of introducing one new word at a time ideally, while optimally reusing vocabulary that has already been introduced in the curriculum. Instead of a human content creator coming up with new sentences, which relies on the limits of their creativity and is biased by their personal memories and blind spots, the spontaneous creative language of thousands of humans is harnessed to provide optimal content.

Having billions of sentences available to choose from, the machine curator will also be able to have the student work with new sentences that don’t introduce any new words at all, but instead serve to practice existing vocabulary in new contexts. This opens up the possibility of review lessons that are entirely new content, but are limited to vocabulary the learner already has familiarity with, giving learners the positive reward of demonstrating they can understand the words in new contexts.

The work of this machine curator would no doubt be an extraordinary computational challenge, but with modern computing resources is not insurmountable. And with the steady march of machine learning techniques and other data science, the possibilities expand continuously.

Capturing Practical Language

One big problem with most of the content on the web is that it’s not very suitable for beginning language learners. It’s both very “advanced”, using a lot of specialized vocabulary, and it tends to be not very useful in daily life. A beginning language learner typically needs to learn aspects of the language that enable them to function well in common situations such as getting around, getting fed, taking care of household needs, and basic social interactions. There is not much content published on the web dealing with these subjects, since they are not of much interest to native speakers. Conversations with taxi drivers, ticket agents, and clothing vendors may be very useful material to a beginning language learner seeking to build practical language skills, but there are not many such conversations published on the web, except perhaps in fictional content that briefly touches on these situations.

However there is another cornucopia waiting to be captured: The millions of conversations that go on between ordinary people in ordinary situations every day. In principle microphones can be placed at grocery checkouts, hotel counters, train stations, and the like to capture these “mundane” conversations. Speech to text has become very accurate, and machine learning techniques are quite good at separating different voices when multiple microphones are available (2). By capturing these every day conversations and adding them to our machine curator’s library of language learning material, a huge amount of material highly useful to beginning learners can be made available.

Clearly privacy and even legal concerns need to be considered. Laws regarding recording conversations vary around the world, though it is generally allowable where there is no expectation of privacy, which is the case in most practical everyday situations mentioned above. Privacy concerns can be essentially eradicated by capturing only the transcripts of the conversations, and rendered by a text-to-speech system when audio learning content is required. Identifying names and other personal information in transcripts can also be automatically obfuscated (“de-identification”) by trained machine learning models (3).

Another challenge is that some spontaneous speech may not be suitable for language learning. Written prose usually consists of well formed sentences, whereas speakers tend to stop sentences mid way through, then express their idea in a better way, or make glaring mistakes that make a sentence grammatically incorrect, even if its still comprehensible to the listener. These conversations are useful for listening practice in their original form, but are not suitable in many language learning situations. Humans can easily decide what is appropriate, and curate this material ‘by hand’, but it may be possible for machine learning models to be created to do this work automatically on a massive scale.

Verbal Earth has been experimenting with using recordings of real conversations for use in its curriculum. We think that this is a very promising domain of exploration. With some effort, vast amounts of organically generated real world conversations can potentially be collected for language learning purposes.

Overview of the architecture for an individualized machine curation system

Finding ‘just the right sentence’

One challenge for our machine curator is finding just the right sentence or phrase. When introducing a new word, it’s possible there may be no sentence available that does not also use other new words, even with billions of sentences to choose from. Or there are no good choices for reinforcing previous vocabulary. This is less of a problem in natural spontaneous speech, which is frequently short, incomplete sentences. However written prose tends to be composed of long, complex sentences with multiple clauses, which complicates its usefulness in a language learning curriculum.

Take the last sentence of the last paragraph as an example. There are quite a few words that are important in the domain of discussing language learning: “sentences”, “clauses”, “learning”, “curriculum”, etc. For somebody learning English in this domain, the sentence may be useful as practice to somebody already familiar with all this vocabulary, but for a learner just getting started it may overwhelm them with unfamiliar vocabulary.

However, this does not make the sentence useless to a machine curator, whose objective is to slowly introduce vocabulary. Long sentences are always composed of comprehensible chunks, which are not fully formed “complete” sentences, but nonetheless convey meaningful units of language. These are typically found in clauses, in noun phrases, or other grammatical structures. For example, “language learning curriculum”. This is a noun phrase that conveys a coherent unit of meaning. This phrase is useful to a machine curator when a learner is familiar with the words “language” and “learning”, but not “curriculum”, as it allows a new word to be introduced in an appropriate context, cognitively tying its meaning to words the learner already knows. Other examples of comprehensible chunks from this sentence are “complex sentences with multiple clauses,” and “written prose tends to be composed of long, complex sentences.”

By identifying the comprehensible chunks of all the billions of sentences in its library, the machine curator then has much more flexibility in its choices for composing a curriculum. 10 billions options becomes 50 billion options, making it easier for a machine curator to find ‘just the right sentence’ that introduces a new word while optimally reinforcing familiar vocabulary, or new sentences that use existing vocabulary in new ways.

Comprehensible chunk is a somewhat vague concept, but it turns out that machine learning techniques are good at building models capable of accurately working with vague concepts, so this task of finding the comprehensible chunks can be done by machine. Sentiment analysis, which attempts to classify the sentiment behind a written product or other review (e.g. “good”, “neutral”, “bad”) is one of the oldest use cases of Natural Language Processing. It’s an inherently vague concept, but it turns out with modern deep learning models, machines can achieve results similar to humans.

At Verbal Earth, we are researching the usefulness of machine learning based on Google’s BERT natural language processing model to perform this task. It takes as input an English sentence, and outputs a set of useful comprehensible chunks. Currently the model has a 92% precision rate, meaning 92% of the outputs are considered comprehensible by a human, and not incomprehensible jibberish such as “a time, passed the place where”. We anticipate even better results in the future as our research, and Natural Language Processing models in general progress.

Machine as personal tutor

Having a generalized curriculum created by a machine curator would be a fantastic achievement, but what if we could go even further, and enable the machine to curate language learning content to meet specific individualized needs? There are two main differences between learners that distinguish them as individuals: learning objectives, and current abilities. By taking into account the current interests of a learner, and their current language skills, a much more personalized curriculum can be created on the fly, turning the machine curator into a personal tutor.

One learner may be interested in building language skills to help them travel in a foreign country, while another may be interested in discussing cooking with a friend in their native tongue. The machine curator, when presenting new content to the first learner can prioritize vocabulary useful in transportation, or shopping. For the second, vocabulary about various ingredients and kitchen utensils can be introduced. Most language learners find that there are units in curriculums made for a general audience that do not interest them, while other units that do interest fall short of meeting their needs.

If a machine curator also knows what vocabulary a learner is already comfortable with, and to what degree, it can provide a highly useful customized curriculum that reinforces words already known that need practice, and know which new words need to be introduced to help a learner achieve their goals. Rather than choosing content from it’s library based on a general curriculum for all learners, it can choose specifically to meet a learners unique needs and interests.

If a machine curator also knows what vocabulary a learner is already comfortable with, and to what degree, it can provide a highly useful customized curriculum that reinforces words already known that need practice, and know which new words need to be introduced to help a learner achieve their goals. Rather than choosing content from it’s library based on a general curriculum for all learners, it can choose specifically to meet a learners unique needs and interests.

Duolingo is currently providing some individualization in it’s lessons and practice sessions, by evaluating how a learning is doing with the current material . More complete sentences and new vocabulary can be introduced more quickly if the learning seems to be finding the material easy, and material that the learner is having difficulty can be repeated, or similar examples provided. (4) LingQ, which does not have a set curriculum, recommends new content by evaluating how much a learner already knows.

Probably the easiest way for a machine curator to keep track of a learners language skills is to keep track of all vocabulary used in the generated curriculum thus far, and provide a way for learners to provide direct feedback to the system about familiarity with vocabulary. Familiar words can get a lower priority when the machine curator selects new sentences, but still prioritized from time to time to make sure it stays “fresh” in a learners memory. The machine curator can make an effort to find more usage examples of a word a learner is having difficulty with, by finding sentences in the cornucopia that use the word, and also use words the learner is comfortable with.

Conclusion

Making quality content for language learning is hard. Fortunately, quality language content is constantly being generated in an organic cornucopia by native speakers and writers of language every day. Using machines to curate this vast amount of content into material suitable for language learners, not just for a general audience, but to meet the specific needs of individual learners, is quickly becoming possible. The exponential growth of computing resources, and the capabilities of machine learning systems to perform the mundane tasks that previously only humans were capable of is what will enable it. Rather than relying on the creative efforts of a few curriculum developers, the vast humanity of millions of native speakers will be made accessible to language learners.

--

--

Mark Sanford

Mark is a software developer in San Francisco, helping realize the potential of technology for lifelong learning.