What machine learning teaches us about learning a second language

Published in

AppLearn Engineering

7 min readJul 20, 2022

“Artificial Intelligence reading a book”, generated by DALL-E mini, enlarged with Bigjpg

Nearly all of us have some experience of learning a second language; normally in school or on Duolingo. I am willing to bet most people fail to reach a native-like level within this language, with loss of motivation due to boredom or lack of progress the probable culprits.

This begs the question; is learning a second language as an adult much more tricky (and tedious) than it is for a child, or are the methods we use to learn/teach languages just garbage?

Well, I believe recent developments in Machine Learning might show we’re just trying to learn in the wrong way.

Note: Throughout this article I’ve played a little fast & loose with terms like ‘think’ and ‘know’ when talking about how a machine learning model operates. The model isn’t really ‘thinking’, it has no stored ‘knowledge’, it’s just doing lots of computations when asked.

Neural networks & our brain

Neural networks (the structure of computers’ ‘brains’) are very similar to the structure of our human brains. How we teach our neural networks often takes inspiration from how our brains learn best.

For what we care about, the primary mechanism in which the brain functions involves two main components: neurons and neurotransmitters (a chemical that ‘carries’ signals between the neurons). When a neuron receives a sufficiently strong input signal from some other neurons, it then fires a signal itself.

In the branch of Machine Learning, ‘Deep learning’, we aim to replicate this structure using artificial neural networks. These networks consist of layers of ‘neurons’ which each receive input from all the neurons in the previous layer (via some ‘connections’). They then do some maths, and fire a signal to the next layer of neurons.

Machine learning in this context is essentially just configuring the strength of the connections between the neurons for our task (and some more complicated maths stuff, like the activation function).

This is very similar to our brains, so perhaps the way we learn is similar?

The traditional approach to studying a language

Traditionally, a few different methods are used when learning a language:

Learning grammar rules
Learning translations for words
Practicing what we’ve learnt (‘outputting’) — writing, speaking, tests, spelling etc.

If you have used Duolingo before, this might sound familiar. You start with learning some basic vocab, usually with translations (man, woman, apple, bread etc.), and then start to study some grammar in the subsequent levels. The whole time you are continuously tested and made to output, by translating some sentences into your native language for example.

Our Machine Learning model does not learn like this at all — it laughs in the face of these extremely boring methods.

(Note: the ‘model’ I’m referring to in this article are BERT models, but there’s no need to worry about that right now.)

ML models don’t know what grammar is

The state-of-the-art models for language processing today, don’t really even know what a word is. They only really think in numbers. When we see a sentence, “Please do not use Duolingo”, the model might see a list of integers: e.g. [198, 1233, 12, 3, 8]. That’s it, no letters.

The model has no way of using even basic grammar rules (such as adjectives before nouns) as it has no complete list of every adjective and noun. It only has some intuition or feeling for what is correct (sort of like, you know, everyone with their native language).

The only thing our model can do is form associations between certain ‘words’ (numbers), and it does this by seeing certain words together (sentences) over and over again (lots of input). I’ll touch more on exactly how we train the model later.

Why memorising translations is pointless

Our model is based on a concept known as ‘attention’ (the famous paper about attention is here)

Let’s look at a classic example. Imagine we want our model to translate the sentence:

"The animal didn't cross the street because it was too tired"

How does our model know what ‘it’ is referring to? Is it masculine or feminine? Is it a real thing or a concept?

If you only learn translations for single words, you very well may use the wrong translation of ‘it’ here.

This is where the attention mechanism comes in — our model pays ‘attention’ to multiple different words in a sentence when making a translation.

We CANNOT know the ‘translation’ for a word without knowing the full context of the sentence (as a minimum, sometimes we need context from other paragraphs). Please don’t learn translations — they don’t exist.

Why outputting doesn’t help us learn new things

Making a model ‘output’ (inferring from a model) does nothing to help the model learn new things. The model is getting absolutely no new information. It’s just telling you what it already knows.

Similarly, if we were to write a book in the language we’re learning, we can learn 0 new grammar rules and 0 new words. We can only reinforce what we already know. This is helpful to humans in some cases because we have only a limited memory (we can’t just dump it all on a hard drive).

But what if we have a misunderstanding of how a word is used, or how the grammar rules work? We will be further cementing that in our memory, making it harder to undo later.

Getting more correct input to have your memory refreshed instead is a much better idea. You will also learn much more new stuff whilst you’re at it.

How do we train the model?

BERT (just the model name) primarily uses one simple ‘unsupervised’ method. We take a bunch of text (e.g. the whole English Wikipedia) and split the text into chunks. We then delete some words within the text and ask the model to predict what the missing words are, one by one. For each missing word it will give a bunch of probabilities for different possibilities.

We then tell the model what the actual word was, and it updates its neural network structure (changes the strength of certain connections, for example) to increase the probability of predicting that particular word, and decrease the probabilities for the rest. Rinse and repeat for tons of text, and that’s it!

This should sound awfully familiar though — because the model is just reading! It’s learning how all these words fit together and associate with each other. There’s no list of rules here, the model is just learning to copy what it reads (swap model for kid & reads for hears, and that’s how we all learnt languages).

So how should humans study languages?

I propose we copy our youth, add in the latest advances and learn using comprehensible input.

We should just get as much input as possible, where we understand the context.

By this I mean we know what the overarching ‘story’ is to a block of text (or speech), and we know most of the words within it, so our brains can just fill in the gaps (EXACTLY like I described above). This allows us to get an intuitive feeling for what a word means, using the whole context of text/speech, which is the only way to become native-like in a language.

I believe learning grammar or translations serves no purpose (actually counterproductive in my opinion as you want your brain to only focus on the new language in front of you), and we simply should just read or listen. That’s it. Just watch TV (well, Netflix & others) and read some books in your target language.

It should be comprehensible as, unlike a model, we don’t have the luxury of time to go through the entire Wikipedia page many times over. But we do have the luxury of our wonderful brain having language-independent thoughts and feelings. We should use these to our advantage, and listen to or read stuff where we understand what’s happening on a conceptual level, allowing us to ‘fill in the gaps’ much more effectively.

Who has this method worked for?

Every single person who has a native language :)

Final thoughts

If you followed the majority of this blog, you have successfully understood the structure of one of the hottest machine learning models out there today — congrats!
I hope this shows that all we need to do to learn is be exposed to stuff that we can comprehend, and lots of it :)
I didn’t even touch on one of the biggest benefits of purely listening to comprehensible input, which is massively improved accents — but maybe that’s for another blog on speech-to-text models.