Revelations of Medieval Art with Image Processing and Machine Learning

Mustafa Qamaruddin
Jan 17, 2019 · 6 min read

Revelations of Medieval Art

Once we start believing that we understand our history, a new breakthrough reveals unexpected secrets. Taking historical facts for granted is no longer our only choice. Scientists have invented new algorithms to decipher the secrets concealed in medieval manuscripts. This shall expand our horizon for understanding the history of humanity from a completely new perspective.

The Evolution of Bookbindings

Illuminated medieval manuscripts decorated with images and colored pigments are fascinating tourists as well as historians. These documents portray a wide range of European cultures such as monarchies that once were, religion, warfare, patrimony and a variety of routines that inspire us to visualize the lifestyle of Europe at those times. In other words, those were messages from our ancestors describing our communal past.

Bookbindings are collections of sheets stacked together and joined by sewing thread. They are usually wrapped by an attractive leather cover for preservation. It is worth mentioning that peculiar bookbinding techniques were common in the Orient and the Middle East simultaneously with their counterparts in Western Europe. Between the 15th and 18th centuries, bookbinders recycled the bindings from medieval parchments into new binding materials for printed books. While scholars have long been aware that books from this time period often contain hidden fragments of earlier manuscripts, they never had the means to read them.

The Challenge

The problem which we shall explore together is twofold. Reading the hidden texts in bookbindings is not obvious on the one hand. On the other hand, understanding the language and deciphering the message is not always straightforward. The first problem lies in the domain of Image Processing while the second overlaps both the domains of Cryptography and Natural Language Processing. However, most of the work done in this research area is based on joint interdisciplinary teams of researchers.

While there are numerous bookbindings in museums, libraries, and cathedrals throughout Europe. The studies we take an interest in were conducted separately on two of them in different research labs. One of them is a copy of Work and Days by the Greek poet Hesiod imprinted in 1537 while the original poem was first written around 700BC. The research concerning this bookbinding was done at the Northwestern University in the United States. The other one is the Voynich manuscript which is carbon-dated to the early 15th century. What is even more interesting about Voynich is that it is written an unknown writing system that cryptographers have failed to identify for centuries what made Voynich one of the famous unsolved mysteries in the history of secrecy. The paper announcing breaking the Voynich puzzle was published by researchers at the University of Alberta in Canada.

Work and Days

The binding had two faded columns of writings on the front and the back covers. This sparkled the initiative which otherwise wouldn’t have been possible without the latest advances in digital image processing. Apparently, the parchments were washed or scraped by the bookbinder before imprinting the poem. Even though the old ink marks of the removed text were noticeable by librarians, but it was rather impossible to know what was actually written without advanced technological aid.

The researcher first applied hyperspectral imaging on the documents, but this led them apparently nowhere. Hyperspectral imaging is an extension to spectral imaging. Spectral imaging is based on the fact that the human eye perceives visible light in three main wavebands; lower bands are perceived as red, middle bands as green and higher bands as blue. Hyperspectral imaging constructs information about subjects based on electromagnetic spectrum radiations with wavelengths beyond human perception of visible light. That makes it a perfect choice for reading the hidden text. It has been around for quite a while and it is no longer a novel technique on its own.

As the researches hit a dead-end using hyperspectral imaging, they moved to x-ray fluorescence ( XRF ). Flouresce is the emission of light by a material that has absorbed electromagnetic radiation. The radiation, in this case, is x-rays. The fluorescence revealed some information about the composition of the ink, but the results were still poor especially when it comes to spatial resolution.

At this point, they realized that off-the-shelf solutions were not applicable to yield the desired results. Here one of the leading pioneers of digital image processing kicks in for rescue. Professor Aggelos Katsaggelos brought to the table a technique borrowed from machine learning. His innovation was combining both the results of hyperspectral imaging and x-ray fluorescence to build a bimodal to describe the statistical contribution of each towards the best results. Thus taking advantage of both of high spatial resolution offered by the former model and the high-intensity resolution offered by the later. The fact that Professor Katsaggelos is of Greek origins makes matters more interesting as if it’s just another date with destiny.


Various theories were postulated to explain the origins of the Voynich manuscript. Till now we don’t actually know who wrote it. Is it a treasure map? Is it a fraud? Nothing is certain. We don’t even know what language was used to compile it. It’s, by all means, a cryptic mystery. During WWI and WWII the science of cryptography has flourished like a green bay tree. However, codebreakers of the time have never been able to solve the puzzle of Voynich.

A consensus has developed though that it is a substation cipher where the real alphabet is intermingled with made-up ones. The problem could be made easier if somehow the language could be made known. Researchers at Alberta University had developed earlier in 2016 an algorithm to find the source language of ciphered text using Natural Language Processing, a subset of artificial intelligence defined as helping computers understand human language.

The algorithm was trained on ciphers based on the Universal Declaration of Human Rights which is available in 380 languages. That’s they have ciphered the textual corpus using state of the art encryption algorithms before feeding it into the language prediction model. Another assumption they made based on one of the historical theories is that the cipher used was anagrams, a method that takes the letters of the original word and shuffles them to become a new work. For example, the word “binary” might become “brainy” when processed by such a method.

When the algorithm was applied to the Voynich manuscript. It inferred that the original language was Hebrew. The researchers themselves were betting it might have been written in Arabic. As proof of their success, they have taken the words deciphered by their algorithm and translated them using Google Translate and to one’s amazement, they gave quite meaningful results in English. For example, the first sentence begins with “She made recommendations to the priest … “. While it’s rather a strange opening, but remember, since the discovery of the Voynich manuscript in the 19th century, nobody has arrived that far.

The finding provoked a lot of criticism. Have you figured it yourself?! It might be another language other than the spoken 380 languages. You’re quite right. An algorithm trained to identify modern languages can’t reliably be used to identify the language of a document that has been carbon dated to the 15th century.

Another criticism addressed the fact that their algorithm has only helped identify the source language, but it’s still a long way to go in order to correctly decipher the text into a meaningful interpretation. This shall require the dedication of linguists and historians as well as cryptographers.

Future Work

The researchers worked separately on these techniques; the first for revealing hidden texts and the second for deciphering such texts. There is no public announcement yet whether the different teams will intentionally join efforts to close the loop, that’s to provide a software package that both reads and deciphers hidden medieval texts all at once, but as the world has become a small village, it’s anticipated that applying the complete pipeline on other bookbindings at scale will happen soon. This is also a greenfield for entrepreneurs to introduce creative solutions for academics.

Moreover making such tools available to consumer devices shall allow more researchers from non-technical disciplines to conduct their own experiments on a wider range of documents and open the door to endless innovation in deciphering the history of humanity.


Digital Image Processing, Machine Learning, Deep Learning…