Harry Potter and the Spanish student

Can you increase vocabulary purely through reading?

Nick Verlinde
5 min readJul 15, 2017

From my own experience reading a foreign language can be frustrating. Particularly when you need to lookup every 8th word. It also doesn't generally feel like much progress is being made. The problem is common for a lot of people, so it begs the question, how useful is it for the intermediate student?

Study: The impact of reading on spelling, meaning and collocation

This study performed on 240 Macedonian school children learning English looked at how passive exposure (reading, listening) to words affected 3 variables (spelling, meaning, and collocation)

collocation expresses how a word is ordered in between other words, what words is it often next to, and in what context might you find it.

In the study students were tested once before and once after on all 3 variables. In between they went through a guided reading of the entire "A Little Princess" by Frances Hodgson Burnett. A relatively long text of about 65000 words.

The students as a group were also tested on a random page to ensure that they already knew at least 95% of the words. This being the magic number for context learning to take place.

The study made the following key observations.

The Pearson Product Moment Correlation Coefficient for the correlation between the word frequency in the text and the relative gains was 0.30 for spelling, 0.45 for meaning and 0.41 for collocation, which confirms the importance of the number of encounters of the word for their acquisition.

In other words, passive exposure to the words in context does lead to learning. And the more frequently the word comes up, the higher the chance of learning it.

Table 3: Results by word knowledge aspect

The questionnaire that was completed by the experimental group at the end of the study revealed that for some participants the novel was a little bit difficult for reading and that some of them used a dictionary while reading the novel. Most of the participants paid more attention to the meaning of the words, less attention to the written form of the words and very little attention to the surrounding words. This is interesting because the results of the study show that the greatest learning gains were in the knowledge of collocations

Seeing words in context seems to be the key factor here, and looking up every word in the dictionary may be suboptimal if it slows you down too much. Especially if less interruptions means, higher reading speed, and ultimately more words in context. There is no indication using a dictionary is useful or needed.

Back to Harry Potter

How can we use this information in the context of tackling a longer and more complex book like Harry Potter y la piedra filosofal.

  1. do we roughly hit the 95% understanding mark
  2. how many words do we stand to learn

total words: the total number of words
unique words: the unique appearance of each word
unique stems: The word for magic wand, varita, and the word for multiple magic wands, varitas, both share the same stem varita. Similarly for verbs, como, comí, comiste, and comía all share the same stem comer (to eat).

This is a bit of a stretch for verbs where the irregular form can be quite different from its stem. But for the purpose of this analysis we will consider hubo, había, habrás to all be parts of the same stem haber (to have).

Any names, places and lore (e.g quidditch) are also removed from the sum which leaves us with our final number of 4409.

Removed Words

Removed Names, Places, Lore

A lot of these words should be familiar to readers of the series.

4409 Words to read the first Harry Potter

The top 25 stems are not too surprising, they are words like

de, que, la, y, a, el, en, no, se, ser, un, los, decir, con, haber, estar, lo, una, su, por, tener, las, para, pero, ir

and showing up on average at least a 1000+ times. In fact just these 25 words together make up 26.3% of the entire text. Worth knowing indeed!

As we go down the list the frequency drops relatively quickly, with 1675 of the 4409 words showing up just once.

If we examine percent of the total book covered (Y) as we go along our 4409 stems (X) we can see that the first few stems very quickly reach about 60%, dropping off drastically after that.

The red line represents the 95% understanding, which in this case is covered by 1857 stems. This is a bigger number than anticipated, and means the intermediate student wanting to read harry potter should have a relatively large vocabulary to begin with.

Certainly more than the ~750 words I had anticipated initially.

If you can take a little pain on the other hand, and are content recognising only 90% of the book, then 1050 stems is enough.

Lets take a look at the last 100 words that make up our required 1857 stems.

abandonado, chaqueta, anaranjado, principal, país, latón, montar, beicon, exacto, ejercicio, corte, contado, tendré, filmadora, cuidar, hamburguesa, loca, repollo, planear, aguantar, estropear, venga, sollozo, ocurrido, aparte, marrón, guante, amenazadora, trepar, indebido, idear, reptil, tronco, lata, haz, nudillo, aburrido, torcer, disculpar, trasero, calvo, roto, esperanza, julio, reírse, buzón, felpudo, correspondencia, postal, factura, vibrar, equivocación, amarillento, comprobar, amarillo, desdoblar, mía, grisáceo, alcance, agitado, esforzar, visa, rifle, tocado, amargura, plan, arreglado, latir, atravesar, asegurarse, pedo, velozmente, lloriquear, hotel, gastado, tostar, tomate, dueño, pego, lunes, peñasco, privado, crujido, escapo, agazapado, bah, frotar, cazuela, tetera, calentar, lamento, aterrado, crujir, entrenar, hechicero, placer, necesario, rollo, lengua, explosión

This should give you some idea if you are ready to tackle Harry Potter.

Next, I would like to do a similar analysis of Paulo Coelho's El Alquimista or some of the Agatha Christies. My expectation is these are better starting books, and will have a lower required vocabulary to get started.

But I'm open to suggestions.

--

--