An introduction to the Madoc Platform

Stephen.
digirati-ch
Published in
7 min readJun 24, 2019

Madoc is an Omeka S based platform for the display, enrichment, and curation of IIIF-based digital objects. The platform itself is a combination of open source services and technology, bound together to provide a single management interface for collections.

Image credit: National Library of Scotland, License: CC BY 4.0

Open standards

In addition to being built upon Open source projects, Madoc uses well-defined Open web standards. These standards ensure that content created inside of Madoc remains accessible, interoperable, and ultimately more useful in the future.

International Image Interoperability Framework (IIIF)

The main open standard that Madoc uses is IIIF, a format for describing real world objects in a uniform but expressive way. A real world object is described in a Manifest which is in turn made up of Canvases where content is displayed to represent the object. A book may be a manifest, and a page of that book may be a canvas.

In Madoc, you can bring any IIIF content into the system from one or many sources, or you can create your own manifests and canvases.

Web Annotations

The second pillar of Madoc after IIIF content is annotations. Annotations bring new dimensions to content, enabling digital surrogates to better represent their real world counter-parts. From transcribing text in a document to linking real world locations to photographs, to real world stories from real people. Every annotation counts.

Madoc uses a W3C annotation server to store the annotations, Elasticsearch to discover them and creates them using an annotation creation tool, which outputs semantic structured data.

Omeka + Linked data

The IIIF and Web Annotation specifications are built upon linked data, so we needed a content management system that spoke their language. Omeka-S is built from the ground up on linked data and provides a management interfaces for IIIF content, custom sites and our annotation models.

Import

The first step in any platform is getting some content in. You can grab IIIF from your own archives if you have any, or grab a few IIIF manifests from the community. You can import IIIF collection and manifests into the system, allowing them to be presented, curated and enriched through Madoc.

Before you can display anything in Madoc, you need a site. In Omeka, you can create as many sites as you want inside of a single platform, and they can each be set up with different content and a different theme or set of user permissions. You can create a site, customise it and add your IIIF content to it. If you’ve just imported manifests, you can create a brand-new IIIF collection to group you content and then add your collection to the site.

So in summary, you import IIIF and assign it to a new site.

Image credit: National Library of Scotland, License: CC BY 4.0

Display

A site is made up of a few things. The first building block for Omeka sites are the pages. Pages can have anything on them, from simple HTML content to embedded IIIF widgets to show off your collection.

As part of Madoc we include a bunch of widgets to put on pages. You can add a crowd sourcing banner to your homepage, with a call to action to get people starting to look or contribute to your content, or you can show off a leaderboard of contributors based on the images they’ve annotated. You can also show a stream of the latest annotated images to keep the content on your site fresh.

Aside from pages, there are other sections of the site that are pre-built based on the IIIF content you’ve added to your site. You have a list of all of the collections, a manifest view and a canvas view. This isn’t a traditional IIIF viewer experience, instead more of an exploded IIIF viewer.

Curate

Now we’ve got our content in, and our pages set up it’s time to manage our content. We’ve imported shallow copies of the IIIF resources. What this means in practical terms is that we can add labels, descriptions and metadata to our Collections, Manifests and Canvases. These labels will show up on the site and in the IIIF resources served from the platform without changing the original IIIF content.

The model we’ve chosen allows IIIF content to also be completely created on the platform. For example, you can create a new IIIF collection and add manifests to it straight from the UI. This will be a hosted IIIF resource and can be dropped into viewers like any other IIIF resource. You can also create new manifests pulling in existing canvases.

If you wanted to pull in a couple of manifests from different institutions IIIF repositories, pull them into your Madoc instance and then create a brand new manifest referencing those canvases, you can. You can build a site around your new content, and even start annotating it. It’s a completely new way to compose IIIF resources into user experiences.

Image credit: National Library of Scotland, License: CC BY 4.0

Enrich

We’ve got content, it’s all described and its displayed on our site with custom pages. Now is the time to get the crowd-sourcing started. Madoc uses the Annotation Studio to create annotations. Annotation studio needs 3 things:

  • An annotation server
  • A IIIF Canvas
  • A Capture Model

The annotation server used in Madoc is called Elucidate. Elucidate is a W3C annotation compliant server built by Digirati and was as one of the first full implementations of the W3C Specification. Madoc adds a layer of authentication to the Annotation server, allowing only logged in Omeka users to add annotations. This is something that’s not currently part of the W3C Annotation specification.

Capture Models are the core of the crowd-sourcing platform. A Capture Model is a JSON document describing both the structure of the annotation that should be created and the UI that should be displayed to fill that structure with data. For example, if you wanted to crowd-source peoples names you may create a Capture Model stating that you want the output to have an RDF property of foaf:firstName, that the user should be presented with a text box and that it should have a label of “Enter the first name of this person”. Capture Models can have many fields showing a full range of input options for users to enter while annotating content.

You can create a couple of Capture Models and add them to a site. This will display as a list of options that can be chosen by the end-user, allowing many different things to be identified from a single site.

Now is the time for the crowd-sourcing to begin. First thing you will want to do is configure which crowd-sourcing options you want to enable in your site. You can toggle on or off features like bookmarking, marking a page as complete so it cannot be annotated any more, flagging and more. Next you can open the gates and enable user-registrations to your new site. Once registered users will be able to use your Capture Models and start enriching your IIIF resources.

The final step

The Madoc Platform is still in development, and is improving all of the time. However we’re working towards a milestone. We want to close the crowd-sourcing loop, building a platform that can take raw content, enrich it and then use all of that data that’s been collected to drive user experiences. We are almost there.

When users are annotating content, all of that content is being indexed into Elasticsearch. We are currently using this to show statistics, like how many images have been annotated, or how many raw annotations are in the system. We want to go much further though. The next steps for the Madoc Platform will be to take all of the structure annotations that have been created, and build an engaging user journey through that content.

For example, let’s imagine you’ve just finished your people identifying crowd-sourcing project and you want to use that data. Well, we’ve got enough information to group those annotations by the persons name. So you could have a page showing all of the people that have been identified in your project. Or you could add a page grouping family names. Free-text search, facets generated from the Capture Models, graphs, statistics, taxonomy integration and curation. The possibilities for showcasing the outputs of these crowd-sourcing projects are endless.

Image Credit: The National Library of Wales & National Library of Scotland, License: CC BY 4.0

What else we’re working on

Aside from search and discovery, we are working hard to bring more options for showcasing IIIF content. You want you to be able to configure your site to use whichever viewer you want, whether it’s a custom Universal Viewer with configuration baked right into the platform, or a multi-panel Mirador viewer. We are also working to create a complete experience for transcribing. Transcribing is an art-form that goes beyond a text box, with concise notation and many different ways to showcase it’s output.

We’re also working on making it easy to integrate Digirati’s own enrichment pipeline, so you can automate your entity extraction, and use crowd-sourcing to improve and curate it.

Getting involved

This was a speedy run through what Madoc can do, if you want more you can take a look at our documentation and website, which goes into more detail on how the interface works, and how to use the Platform.

If you want to get involved, reach out on our Github with your use case or contact us through our website.

--

--

Stephen.
digirati-ch

Technical Lead at Digirati, working on creating tools for displaying, enriching and exploring Digital Collections.