Publish interactive historical documents with Archivist
I am proud that after more than one year of hard work, I can announce the launch of Archivist, a platform for publishing interactive interview transcriptions online. Archivist has been developed for the Memorial International Society to publish interviews of Ost-Arbeiters and WWII prisoners. They are published as full transcriptions (in russian) complete with multimedia sources (audio, video). Editors are able to tag and link subjects, locations, persons and definitions in the text, so that the archive can be queried later in interesting ways. Researchers are able to perform a full-text search, but also filter interviews by related subjects and external entities.
At Data4Society we believe that public education as well as memory of victims of political and war persecutions can change the world, so we were very serious about it. We are glad that we can now present the results. I’d like to thank the Memorial International Society for inviting us as collaborators, sharing with us their experience in the field of historian sources as well as their approach to oral history and support us to make this vision a reality. With the help of grants from the EVZ Foundation, and Open Society Foundations it was possible to develop this first version. Special thanks also to the Heinrich Boell Foundation for their support!
Archivist is a full-stack publishing solution involving different technologies to power digital archives. We want to look now at each component in more detail.
At the heart of the platform there’s Archivist Writer, a modern web-editor which allows you to annotate your text with basic markup and external data, e.g. you can:
- reference entities inside text
- mark any piece of text as related to some set of terms from an ontology-tree
- insert timecodes to synchronize text with media source
- leave comments for any piece of text for editors/researches collaboration
- describe the document’s metadata
Some of the interviews from Memorial’s Ost-Arbeiters archive last over 7 hours of time. The resulting documents are typically incredibly large, e.g., more than 10000 paragraphs, having the same amount of annotations and comments attached to it. That’s like a small book, isn’t it? And Archivist Writer was powerful enough to handle it. Amazing!
Managers are a set of interfaces to input and manage resources, that are stored in the database. We are managing four types of entities: toponyms, prisons, persons and definitions, each of them has different properties. In addition to entities you can manage the subjects ontology tree and users.
Here is what you can do:
- count how often an entity or subject is mentioned in the full corpus of documents
- see in which documents entities or subjects were mentioned and jump inside to see where exactly they were referenced
- merge any entities or subjects: in this case the system will replace every reference inside your archive with the desired one
- in the same way you can remove entities or subjects: they will removed from all documents automatically
- The search index will be updated on all of those operations
Archivist Browser is the main entry point for your archive. Here you can see a list of all documents from your archive, perform full-text search queries and filter using the ontology tree.
In fact Archivist Browser is just an interface for running ElasticSearch queries. Simple and powerful. We are indexing each fragment of documents as well as all entities. From the result list you are able to jump right into an interview highlighting the subject or entity you were interested in. Archivist Browser is based on Lens Browser from eLife. We’d like to thank eLife for developing this browsing interface and making it available as an open source project. It was easy for us to adapt the implementation to our needs.
Archivist Reader was developed to present interviews in the best possible way. It makes possible exploring linked resources without losing the place in the interview. You can jump straight to the media source in that places where editors placed timecodes, so you can read and listen or watch the original record. Again without losing position in the text. All resources (locations, persons, etc.) have links to the browser, so you can see a full description of locations and persons etc. as well as access all interviews where a specific term was mentioned. You can also see every location on a map using our map browser.
Our map interface contains two clustered layers which represents all records of toponyms and prisons entities. While hovering each point you’ll see the name of the location and how often it’s mentioned in documents of your archive, you can click on the exact point and get the full description as well as all referenced documents complete with links to entry point of that entities inside a document. We are using the Mapbox platform for the Map Browser, so you can customize your map tiles.
Powered by Substance
Realizing Archivist wouldn’t have been possible without Substance, the best web-based editing platform available. I was working closely with Michael Aufreiter and Oliver Buchtala, adapting their latest technology to power Archivist. Substance is already quite successful in the field of scientific publishing and we are very proud to bring content-as-data document authoring to the field of humanities.
The Future of Archivist
Archivist is a completely new technology with a lot of potential for improvement. It is used in production for the Ost-Arbeiters interview archive. However, this could just be the beginning. We have a lot of ideas how this could be turned into a more general, more powerful system for digital archives. What we prototyped here could be a model for a new generation of knowledge management systems. We look forward to applying this model to similar use-cases and further advance this open source technology.
We would very appreciate to know what you think about this project and share your ideas. Please leave a comment here or write to firstname.lastname@example.org or email@example.com.
Don’t let Archivist gather dust!