Manuscripts on Wikidata: the state of the art?

Martin L Poulter
9 min readOct 14, 2021

--

Thanks to the generosity and cooperation of the Khalili Collections, I have recently been sharing images and catalogue data for a beautiful set of Islamic manuscripts in my role as the Collections’ Wikimedian In Residence.

Quran fragment QUR 413 from the Khalili Collection of Islamic Art

Manuscripts are particularly exciting things to catalogue because they are at once literary, artistic, historical, religious, and even political. This is particularly true of the Khalili Collections’ manuscripts, many of which have unique, exceptional calligraphy and illustrations. Some of them are known to have been commissioned by, or written by, notable historic figures: Quran QUR 614 was a retirement project for ‘Abd al-Haqq Amanat Khan Shirazi who had calligraphed inscriptions on the Taj Mahal. As well as many manuscript Qurans, there are literary works including the national epic of Iran, the Shahnameh; the world history Jami’ al-tawarikh; the Hajj pilgrims’ guide Anis Al-Hujjaj; and the collection of prayers Dala’il al-Khayrat.

In my previous post at Oxford University I was sharing data about very similar objects. The Bodleian Library holds the Shahnameh of Shah Tamasp, and I wrote about how easy it was to represent this in Wikidata. The Ashmolean Museum has paintings from another exemplar of the Shahnameh. So how can these different pieces of information join together, and link up with work done in other institutions, other countries? How can someone interested in the Shahnameh, or with a specific character in the Shahnameh, find the relevant manuscripts, or even individual paintings, and learn where those objects and further information can be found?

Use cases

Describing the Khalili manuscripts in Wikidata has allowed them to appear in Wikidata-driven applications. Here is an interactive timeline of manuscript Qurans in HistropediaJS. If I look up what Reasonator knows about the Anis Al-Hujjaj, under “exemplar of” it points to the manuscript in the Khalili Collection. So there are benefits to the individual collection in terms of interactivity and discoverability, but more exciting is the prospect of linking multiple collections.

If we can represent manuscripts from different collections in one database, we can ask it all sorts of questions that a single institution’s catalogue could not answer:

With the technology we have, we should be able to get answers in one place, without consulting dozens of different institutional catalogues. We should be able to share the results of any of these queries just by posting a web address. One barrier is a lack of consensus about how Wikidata should represent the properties of manuscripts. This article is an attempt to capture current good practice, and is a companion to the on-wiki documentation that I have contributed to. I’m grateful to fellow Wikidatans Paula Marmor and Nicolas Vigneron for helpful discussions.

Note: Wikidata uses language-independent “P numbers” to identify properties. Much of the time, we don’t need to know these numbers. When adding these properties manually, the interface auto-suggests the property. P numbers are mentioned in this article to avoid ambiguity and to point to alternative language labels.

What we’re describing

  • There is the literary work, for example when we are considering the Shahnameh but not any particular exemplar or edition.
  • A manuscript exemplifies the literary work, or multiple works. We use P1574 exemplar of to link the physical manuscript to the literary work.

The manuscript might exist in multiple places. Parts might be dispersed across different collections. If so, we need an extra layer:

  • The catalogued object might be a section of the manuscript, an individual folio or bifolio. We have P361 part of and P527 has part to link these dispersed parts to the representation of the original manuscript.
Alexander the Great depicted in the Shahnameh as praying in front of the Kaaba, surrounded by other pilgrims. Attribution: Khalili Collections
  • The manuscript or section can include paintings which are significant pieces of art in themselves. They might depict real or fictional entities. Some of the most interesting facts about the Khalili Collections depictions, including a painting in the Shahnameh which depicts Alexander the Great praying at the Kaaba. Again we can use P361 part of and P527 has part to connect the painting to the manuscript it is from.

The literary work

The work is an abstract object which does not need many properties:

  • P31 Instance of: literary work
  • P50 Author which could be unknown or anonymous
  • P571 Inception,the point in time at which a full version of the text first existed
  • P407 Language of work or name meaning the original language in which it was written
  • P1476 Title in the original language
  • Wikidata doesn’t presently have an inverse property of P1574 exemplar of. It’s not really needed, but where we want to point from a work to its exemplars, people seem to be using P747 has edition or translation.
  • There are properties to describe the content of a literary work, such as P921 Main subject, P136 Genre, P840 Narrative location, P941 inspired by. See the guidance at WikiProject Books.

An example item for a work: the Anis Al-Hujjaj.

The manuscript

P31 instance of one of manuscript (Q87167), illuminated manuscript (Q48498), codex (Q213924), manuscript codex (Q2217259), palimpsest (Q274076), or manuscript fragment (Q30103158). The distinguishing feature of “manuscript codex” rather than “codex” is that the former combines multiple works.

If the manuscript is of an identifiable literary work, or combination of literary works, they can be linked with P1574 exemplar of. In the case of the Quran, we can specify which parts of the Quran are represented because Wikidata represents each surah (chapter). If, on the other hand, the text is of a type such as Evangeliary (Q1754581), Gospel Book (Q690851), lectionary (Q284465), book of hours (Q727715), or diwan (Q1991869) we have P136 genre. There is no implication that the manuscript is complete, so if it contains some but not all of a text it is still an exemplar of that text.

The number of folios should be put into P1104 number of pages, even though a folio does not correspond to the modern notion of a page. An individual folio or bifolio should be P31 instance of manuscript (Q87167) or illuminated manuscript (Q48498) with one or two pages.

Calligraphic work showing six script styles of Arabic and Persian text. Attribution: Khalili Collections

To describe where and how it was made, we have P571 inception, P1071 location of creation, P186 made from material, and P88 commissioned by. P495 country of origin should refer to the historical country, not present-day country as used by some catalogues. So this 14th century single-volume Quran has Mamluk Sultanate rather than Egypt.

P6819 calligrapher (Note that manuscripts don’t usually have a P50 author; the author is the person who created the work of literature while the calligrapher is the scribe who wrote it on this particular surface.) If the illustrations are by a different person, they are the P110 illustrator.

We have P407 language of work or name and P9302 script style to say, for example that a manuscript is in Persian written in Nasta’liq script. Some Arabic manuscripts combine multiple script styles; it is possible to include them all but I think we should just aim to specify the style of the main body text.

To say where the manuscript is now, we have P195 collection, P217 inventory number (qualified with the relevant collection), and P276 location. With suitable qualifiers, these can also describe the ownership history.

For further information from an authoritative source, there is P973 described at URL. For manuscripts that are in a catalogue such as the Catalogue of English Literary Manuscripts 1450–1700, there is P528 catalog code. There are some dedicated identifiers for manuscript catalogues, such as P3702 Catalogue of Illuminated Manuscripts ID for the British Library and P1577 Gregory-Aland-Number for New Testament manuscripts.

Wikidata-generated infobox for the Khalili section of the Jami’ al-Tawarikh

If the discovery of the manuscript is a significant event, we have P575 time of discovery or invention and P189 location of discovery. In some cases, manuscript codices are named after a discoverer or owner, and for that we have P138 named after.

If a complete scan of the manuscript is available online, there is P953 full work available at URL. Wikidata does not have separate properties for image scans versus textual transcription, so this property seems to be used for each. It would be nice to have separate properties for “full scan” and “full text”. If the digitisation is in the IIIF format, the link can be made with P6108 IIIF manifest.

Hopefully there is at least one image of the manuscript on Wikimedia Commons. This can be put in P18 image, which enables its use as a thumbnail in tools like the Histropedia timeline. If there are lots of images on Commons, they might have their own category, which can be pointed to with P373 Commons category. A lot of Bodleian Library manuscripts have dedicated categories on Commons, each category with an “infobox” of facts and figures provided by Wikidata.

The catalogued object

As indicated above, we only need this layer if the manuscript now exists in more than one collection. It has the same set of properties as described above for manuscripts, but with P361 part of to point to the original manuscript.

The painting

P31 Instance of should be miniature (Q8362) or painting (Q3305213). It is P361 part of the manuscript or manuscript section. P217 inventory number can be used to specify the folio number, e.g. this painting from the Shahnameh of Shah Tamasp has inventory number MSS 1030, folio 38. We need to specify P170 creator (which can be anonymous or unknown) even if that person was already linked as the illustrator of the manuscript. Hopefully there is an image of the painting on Commons which we can point to with P18 image.

If the painting is a specific genre like a portrait, that can be indicated with P136 genre and if it depicts a real or fictional event, that can be the P921 main subject. For physical dimensions, we have P2048 height and P2049 width. More general properties that can apply to a painting are listed on this Talk page.

An exciting thing about describing paintings is using P180 depicts. This can be qualified with P6022 expression, gesture or body pose so we can find, for example, paintings where Alexander the Great is shown kneeling rather than standing.

Conclusion

We have the potential to express in open data such chunks of knowledge as:

Alexander the Great is shown kneeling at the Kaaba in a 16th-century painting on a folio from the Shahnameh, created in Shiraz by an unknown artist, now in the Khalili Collection of Hajj and the Arts of Pilgrimage, which is one of the Khalili Collections.

We’ve seen that Wikidata can be very expressive when describing manuscripts, but almost all of the properties discussed here are optional. At a core, what’s useful to know is

  1. What is it? P31 Instance of
  2. How did it come about? P571 inception, P495 country of origin, the usually anonymous P6819 calligrapher
  3. Where is it now? P195 collection, P217 inventory number
  4. Where is reliable information about it? P973 described at URL

Imagine the connections we can discover, or the fun interactive things we can make, once more medieval and ancient manuscript collections are represented on Wikidata.

Originally published at https://medium.com on October 14, 2021.

Addendum: This blog post is cited in Toby Burrows “Linked Open Data and Medieval Studies: Some Lessons from the Mapping Manuscript Migrations Project” International Journal of Humanities and Arts Computing, Volume 16 Issue 1, Page 64–77, ISSN 1753–8548 Available Online Mar 2022

--

--

Martin L Poulter

Wikimedian In Residence at the Khalili Foundation; Former Wikimedian In Residence at the University of Oxford, exploring open data and open content