Publish interactive historical documents with Archivist

Daniel Beilinson
6 min readSep 15, 2015

--

I am proud that after more than one year of hard work, I can announce the launch of Archivist, a platform for publishing interactive interview transcriptions online. Archivist has been developed for the Memorial International Society to publish interviews of Ost-Arbeiters and WWII prisoners. They are published as full transcriptions (in russian) complete with multimedia sources (audio, video). Editors are able to tag and link subjects, locations, persons and definitions in the text, so that the archive can be queried later in interesting ways. Researchers are able to perform a full-text search, but also filter interviews by related subjects and external entities.

Archivist Writer

At Data4Society we believe that public education as well as memory of victims of political and war persecutions can change the world, so we were very serious about it. We are glad that we can now present the results. I’d like to thank the Memorial International Society for inviting us as collaborators, sharing with us their experience in the field of historian sources as well as their approach to oral history and support us to make this vision a reality. With the help of grants from the EVZ Foundation, and Open Society Foundations it was possible to develop this first version. Special thanks also to the Heinrich Boell Foundation for their support!

Platform overview

Archivist is a full-stack publishing solution involving different technologies to power digital archives. We want to look now at each component in more detail.

Archivist Writer

At the heart of the platform there’s Archivist Writer, a modern web-editor which allows you to annotate your text with basic markup and external data, e.g. you can:

  • reference entities inside text
  • mark any piece of text as related to some set of terms from an ontology-tree
  • insert timecodes to synchronize text with media source
  • leave comments for any piece of text for editors/researches collaboration
  • describe the document’s metadata
Archivist Writer

Some of the interviews from Memorial’s Ost-Arbeiters archive last over 7 hours of time. The resulting documents are typically incredibly large, e.g., more than 10000 paragraphs, having the same amount of annotations and comments attached to it. That’s like a small book, isn’t it? And Archivist Writer was powerful enough to handle it. Amazing!

Tagging in Archivist: KZ Buchenwald entity reference (left), assigning subjects to text fragment (right)

Archivist Managers

Managers are a set of interfaces to input and manage resources, that are stored in the database. We are managing four types of entities: toponyms, prisons, persons and definitions, each of them has different properties. In addition to entities you can manage the subjects ontology tree and users.

Archivist managers: definitions manager (left), map view of toponyms manager (center), subjects manager — strict ordered ontology tree (right)

Here is what you can do:

  • count how often an entity or subject is mentioned in the full corpus of documents
  • see in which documents entities or subjects were mentioned and jump inside to see where exactly they were referenced
  • merge any entities or subjects: in this case the system will replace every reference inside your archive with the desired one
  • in the same way you can remove entities or subjects: they will removed from all documents automatically
  • The search index will be updated on all of those operations

Archivist Browser

Archivist Browser is the main entry point for your archive. Here you can see a list of all documents from your archive, perform full-text search queries and filter using the ontology tree.

Archivist Browser: combining ontology tree facets with full-text search (left), KZ Mauthausen entity overview (right)

In fact Archivist Browser is just an interface for running ElasticSearch queries. Simple and powerful. We are indexing each fragment of documents as well as all entities. From the result list you are able to jump right into an interview highlighting the subject or entity you were interested in. Archivist Browser is based on Lens Browser from eLife. We’d like to thank eLife for developing this browsing interface and making it available as an open source project. It was easy for us to adapt the implementation to our needs.

Archivist Reader

Archivist Reader was developed to present interviews in the best possible way. It makes possible exploring linked resources without losing the place in the interview. You can jump straight to the media source in that places where editors placed timecodes, so you can read and listen or watch the original record. Again without losing position in the text. All resources (locations, persons, etc.) have links to the browser, so you can see a full description of locations and persons etc. as well as access all interviews where a specific term was mentioned. You can also see every location on a map using our map browser.

Archivist Reader: jump to media source from fragments (left), highlighting of political repressions topic in the whole document (center), highlighting mentions of Seebach toponym in the whole document

Map Browser

Our map interface contains two clustered layers which represents all records of toponyms and prisons entities. While hovering each point you’ll see the name of the location and how often it’s mentioned in documents of your archive, you can click on the exact point and get the full description as well as all referenced documents complete with links to entry point of that entities inside a document. We are using the Mapbox platform for the Map Browser, so you can customize your map tiles.

Archivist Map Browser: main view (left), location details view (right)

Powered by Substance

Realizing Archivist wouldn’t have been possible without Substance, the best web-based editing platform available. I was working closely with Michael Aufreiter and Oliver Buchtala, adapting their latest technology to power Archivist. Substance is already quite successful in the field of scientific publishing and we are very proud to bring content-as-data document authoring to the field of humanities.

The Future of Archivist

Archivist is a completely new technology with a lot of potential for improvement. It is used in production for the Ost-Arbeiters interview archive. However, this could just be the beginning. We have a lot of ideas how this could be turned into a more general, more powerful system for digital archives. What we prototyped here could be a model for a new generation of knowledge management systems. We look forward to applying this model to similar use-cases and further advance this open source technology.

We would very appreciate to know what you think about this project and share your ideas. Please leave a comment here or write to info@data4society.org or info@substance.io.
Don’t let Archivist gather dust!

References:

--

--