The Entropy of Language

John Lo
Language Insights
Published in
4 min readJul 13, 2018

Entropy is the measure of uncertainty.

The concept originated from physics, which it is used as a measure of disorder. It is applicable to a statement in physics that an isolated system becomes less organized with time, in other words, having a higher entropy.

Language, as a system, also has a increasing entropy with time, if isolated. So a language will only stay vital when used for communication, or it will become a dead language.

However, the entropy of language itself is much more interesting.

Entropy of meaning

The uncertainty of meaning of words exists in two forms, types of meaning and coverage of meaning.

Types of meaning

To enhance the efficiency of vocabulary usage, some words are assigned multiple meanings, but it also means we have several possible interpretations of the meaning of the word. So there is a tradeoff between vocabulary usage efficiency and vocabulary interpretation entropy, and different languages have applied different strategies.

Vocabularies are made of vocal units, and frequent words are assigned shorter combinations of vocal units, such that the conversation is more economical by making use of lowest number of vocal units.

A higher vocabulary usage efficiency means that the number of vocal units in a conversation will be lower, due the reduced number of words. However, since there are multiple possible interpretations of the conversation, the pronunciation of words will be slower to give time for listeners to interpret the conversation.

A lower vocabulary interpretation entropy means that the room for ambiguity in conversation is lower, so the pronunciation of words is faster. However, to limit multiple meanings of words, unique words should be made for different meanings, which means that there are more words required, so words with higher number of vocal units will also exist, making the conversation having a higher number of vocal units.

Comparing the above two cases, the transmission speed of meaning is actually the same, differing in representing words with less vocal units but slower pronunciation speed and representing words with more vocal units but faster pronunciation speed.

The methods above are said to have similar information rate, which is information density times syllable rate.

Coverage of meaning

Another cases of entropy of words exists in the coverage of meaning. The vagueness of a word is determined by the precision of meaning it conveys. In this way, vagueness of word is another entropy. A good example is how we describe time by words with different vagueness.

To convey a precise meaning, more words are required to represent the same meaning in different cases. In this way, the vagueness of meaning of word is decreased by increased number of words.

To reduce the number of words used, several words may be combined to represent a similar meaning, in the cost of increased vagueness of meaning of word.

So there is also a tradeoff between the vocabulary meaning entropy and number of words, which usually depends on the demand of precision of meaning.

Entropy of conversation

The entropy of conversation refers to the entropy of words that will be pronounced. This is important as research found that if a vocabulary is heard, the corresponding brain area will be activated, so that the same vocabulary will be recognized in a shorter time when heard again. If several vocabularies are expected to be heard, the corresponding brain areas can be pre-activated to speed up our cognition.

The entropy of word that will be pronounced is determined by number of possible vocabularies. Smaller number means that we can pre-activate less brain areas for cognition to save energy.

The above entropy can be reduced in several ways, like word order and gender.

Word order

Word order restricts the number of possible words following the previous words by limiting the following word to be in only one or several parts of speech. For example, in english, following a subject we won’t expect any occurence of noun, by obeying the word order of english.

Gender

Gender of nouns, occurring in some languages, divides nouns in several groups, called genders. Accordingly, different ‘the’ is used for different gender. Thus, when the gender indicator ‘the’ occurs, we can expect only nouns with this gender, reducing the entropy of words that will be pronounced. Depending on the number of genders, the entropy can be reduced by a half or 2/3.

The great reduction of entropy gives language with gender the possibility of flexible word order, as it can afford the associated increase of entropy.

Conclusion

Entropy, as the measure of uncertainty, finds its applications in describing life and language.

--

--