A million words?

English is adding new vocabulary at an astonishing rate.


The English language already has the largest lexicon (number of words) of any European language. Research by Harvard linguists Dr Jean-Baptiste Michel and Dr Erez Lieberman Aiden suggests that this comparative advantage will not change any time soon.

In 2010 of they published the most extensive analysis of the English language ever conducted. It looked at 5,195,769 digitised books published between 1800 and 2000. This amounted to 4% of all the English language publications of the period.

They found that the English language doubled in size during over the twentieth century and was continuing to expand at an accelerating rate:

The language has grown by more than 70 per cent since 1950 …. The previous half century it only grew by 10 per cent…. {it is currently increasing} by 8,500 words a year

The Oxford English Dictionary

The twenty volumes of The Oxford English Dictionary

The English language does not have an equivalent version of the Académie française. The closest to equivalent to official authorisation of a word’s validity is inclusion in one or more of the key dictionaries: the Oxford English Dictionary (OED) for British English and the Merriam Webster for American English. In the word game Scrabble, for example, you can only use words are authorised against the relevant dictionary.

Entrance to these dictionaries is entirely based on usage, rather than adherence to abstract criteria.

The Second Edition of the 20-volume Complete Oxford English Dictionary contains full entries for 171,476 words in current use. This it concedes may be misleading:

there are, at the very least, a quarter of a million distinct English words, excluding inflections, and words from technical and regional vocabulary not covered by the OED, or words not yet added to the published dictionary, of which perhaps 20 per cent are no longer in current use. If distinct senses were counted, the total would probably approach three quarters of a million.

Dark Matter

Interestingly, much of the new vocabulary discovered by the Harvard project has not made it into the OED or the Merriam Webster. They have been dubbed lexical “dark matter”: made up of slang, invented jargon and other vocabulary that does not form part of what linguists call standard register.

This is surprising because the leading dictionaries pride themselves on being broad churches. The OED includes 41,700 words that it deems ‘obsolete’ and are unlikely to be used in contemporary speech or writing. There is even space for 240 ‘ghost words. A ghost word has never existed outside dictionaries.

Where they do draw the line is with vocabulary deemed to be transitory or ephemeral. This can be difficult to identify. During the 2018 World Cup the English-speaking world was introduced to the vuvuzela, a plastic horn popular in South African football stadiums. Four years later it is already fading from the collective memory.

Word hoarding

Even the comically misnamed Shorter Oxford English Dictionary, weighing in at two very fat volumes, has more words even the keenest scrabble player could employ. This would suggest that there might be a natural limit to the collective word bank.

Technology has, however, come to the rescue of obscure and obsolete vocabulary. Online editions of the behemoths of the lexical canon will ensure their survival on (digital) paper at least. These will continue to gobble up cyberspace into the foreseeable future. But how much of this vast word mountain do we actively use?

Stephen Pinker suggests that the average English speaker has learned 60,000 words ‘by high school’ but Richard Lederer estimates that only around 10,000 of these are used actively. Others argue as few as 3,000 would cover most situations outside of a cryptic crossword.

Yet according to the Harvard Project our word bank now stands at 1,022,000 words. We have million words to choose from, even if many are packed away in an online loft, unused and long forgotten.



