Zipf’s Law and Angelic Languages

Marco Ponzi
ViridisGreen
Published in
8 min readJan 16, 2020

In his essay “Angelic language or mortal folly?” (in “The Complete Enochian Dictionary”, 1978) linguist Donald C. Laycock discusses the two so-called Enochian or Angelic languages documented by John Dee and Edward Kelley in the 1580s. The two languages appear in manuscripts that record the supposed conversations among the two men and angels.

Liber Loagaeth

The first language appears in a text known as Liber Loagaeth or Liber Logaeth. Each word occupies a cell in one of two large 49x49 tables.

My analysis is based on the edition by Joseph H. Peterson. Thetext contains a few irregularities (in particular, the last 9 lines of the second table), so the total number of words is 4439, instead of the expected 4802.

For instance, this fragment of the First Language from ms Sloane 3189 f8r reads:
lax or setquah vah lox remah nol sadma vort / famfa le gem nah or sepah vartef a geh oha / lon gaza onsa ges adrux vombalzah ah vaxtal

Laylock describes the First Language as similar to glossolalia: “some of the texts run so fluently, with so much repetition, rhyme, alliteration, and other types of phonetic patterning, that we are almost forced to conclude that, in the later texts at least, Kelley was speaking aloud, and probably at normal speech speed. […] Statistical studies in linguistics show that patterning of this nature is rare in normal language — though it is found in poetry and magical charms. It is also characteristically found in certain types of meaningless language (such as glossolalia), which is often produced under conditions similar to trance”.

The Enochian Calls

The Second Language, or “the true Enochian Language”, comes with an English translation, also provided by the “angels”. According to Laylock, it is not possible to establish a grammar for this language and the translation appears to be largely arbitrary. Yet, with respect to the first one, this could be somehow closer to an artificial language.
The corpus of the language consists of 19 invocations. Here I use the transcription of Dee’s preliminary version as published by Philip Neal. The text includes 1075 words.

As an example of the Second Language, this passage (Sloane 3191) was transcribed as:
adgt v’_pa_ah zongom fa_a_ip sald / vi_i_v l sobam j_al_prg j_za_zaz / pi_adph cas_arma abaramg ta talho / paracleda q_ta lors_l_q turbs

The translation provided by Dee and Kelley: “Can the wings of the winds understand your voices of wonder, / o you the second of the first. Whom the burning flames have framed / within the depth of my jaws; Whom I have prepared as cups / for a wedding, or as the flowers in their beauty.”

A particularly suspect trait is that the translation invariably is much longer than the original text. Laycock writes: “If the phonology of Enochian is thoroughly English, the grammar is no less so. But here we are faced with one difficulty: the nature of the translation. The English rendering of the Enochian calls is very free, often using five or six words where the Enochian has one; thus, the word for ‘man’ (or ‘reasonable creature’) is glossed as ‘the reasonable creatures of Earth, or Man’. Proper names, such as ldoigo (one of the ‘Names of God’) are given translations: ‘of Him that sitteth on the holy Throne’. Particles, prepositions, and pronouns are filled in where the sense requires them, but we do not know exactly what they are supposed to represent in Enochian; ‘moooah’, for example, is glossed as ‘it repenteth me’ — but it could just as easily be an active verb (‘I regret’)”.

Zipf’s Law

In this post I intend to analyse the two Angelic Languages by Kelley and Dee at the light of Zipf’s law, an observation made popular in the 1930s by George Kingsley Zipf: in a natural language corpus, when words are sorted by their rank (r), the frequency of each word is inversely proportional to the power of its rank, with an exponent (alpha) close to 1. This distribution is known as a power law.

In particular, in the case of English, the constant C appears to be close to 0.1. The value of C is the estimated frequency of the rank-1 word, i.e. the most common word (typically “the” in an English text).
When words are plotted on a chart with the logarithm of rank on the X axis and the logarithm of frequency on the Y axis, the observed distribution appears to be close to a line.

The following graphs are based on the first 4500 of the Genesis in the English King James translation. I limited the texts I examined to 4500 words, so that text length is comparable with that of the most extensive Angelic Language (Liber Loagaeth). The blue line corresponds to the best fit for the values of C and alpha (as computed by the scipy.optimize.curve_fit python function). The two plots represent the same data, the one on the left uses a linear scale, while that on the right a logarithmic scale.

King James Genesis. Left, linear scale. Right, logarithmic scale.

These two graphs are based on Virgil’s Latin poem the Aeneid.

The Aeneid by Virgil. Left, linear scale. Right, logarithmic scale.

The value at the top right of each graph is Normalized Root Mean Square Deviation, where deviation from the best-fitting curve (the blue line) has been normalized according to the difference between the maximum and minimum frequencies observed for each data-set. Since the minimum always is close to zero, the normalization factor is almost identical to the maximum frequency. The result is that “flatter” curves, where all words have similar frequencies, can be compared with distributions with more varied frequencies. The Latin plot is considerably “flatter” than the English plot, with the most frequent word (“et”) only occurring 145 times (3.2%); in the English Genesis, “and” occurs 482 times (10.7%, close to the C value observed by Zipf). The second most frequent word is “the” (437 occurrences, 9.7%). King James Genesis is slightly anomalous both because the most frequent English word typically is “the” rather than “and”, and because the frequency of the first and second rank words are so close to each other: in log scale, the second dot in the plot is far above the blue line.
The two NRMSD values are both low, showing little deviation from an ideal power function. The Latin text fits better than the English text (0.008 Deviation vs 0.017). Of course, these are just examples and might not be representative of the general behaviour of Latin and English texts. In particular, the English Genesis is a particularly repetitive text, while classical Latin poetry like the Aeneid has a more varied vocabulary than other Latin works.

The following plots illustrate power-law fitting for the two Angelic Languages.

The two Angelic Languages. Both plots in logarithmic scale.

As can be seen, the Normalized Deviation values are within the range defined by the previous two natural language texts. The Second Language (Enochian) appears to be closer to an actual language: its deviation is lower and the frequency of the most common word is closer to what one can see in an actual language. The most frequent word is “od” (meaning “and”) that occurs 96 times in this corpus which is only 1075 words long; the resulting frequency (8.9%) is close to the frequency of the English conjunction “and” and to the C value originally observed by Zipf.

Liber Loagaeth, though closer to glossolalia, still fits a power-law distribution better than the English Genesis (NRMSD: 0.014 vs 0.017). What hints to its non-linguistic nature is not a non-power-law distribution, but the fact that its most frequent word (“a”) only occurs 63 times, 1.4% — this value is considerably lower than expected.

It should be noted that the fact that the Anglic Languages conform to Zipf’s Law does not have a great significance. In 1992, Wentian Li pointed out that random texts also conform to Zipf’s Law. The following graph corresponds to 4500 words generated with Li’s simple method.

Random text generated with Weitan Li’s method

Symbols from an alphabet of four characters plus space are randomly selected. Words that are longer than 4 characters are discarded. The symbols in the alphabet are selected with different probabilities. The resulting text looks like this:

bbaa bc ac caaa a bc a aaa cb aa abba aac daa aa dddc abba abba b aa ab dcba baad aad aa bba ada ad db aa ba caa d aa baa a a b a a ca baa ada abcb caaa daab bab c caa dbd bc aa a bac caab a a aaa a aaab acca ab ac baac baad aaa aa a a ac abc a a aa abb aac aaba c ad…

Li’s random text results in a better fit (i.e. lower NRMSD value) than the other data-sets discussed here.

As an example of data that do not conform to Zipf’s law, one can consider the distribution of individual characters in the English Genesis text that we discussed above. Both the high NRMSD value and the visual distance of the dots from the blue line are quite clear. In this case, the linear-scale plot on the left shows that the frequency of characters tends to decrease more or less quadratically, rather than according to a power function.

Characters in King James Genesis. Left, linear scale. Right, logarithmic scale.

Overall, I find the conclusion of Li’s 1992 paper a good summary of the situation: “Zipf’s law is not a deep law in natural language as one might first have thought. It is very much related the particular representation one chooses, i.e., rank as the independent variable.” The fact that a text follows Zipf’s law is not a strong indicator that the text is linguistic. It largely is a consequence of how data are plotted: rank on the X axis forces frequency to follow a decreasing trend. The specific shape of the distribution can be observed in data sets from very different domains, including simple random data: it depends on something much more basic than linguistics.

--

--