Recogito-TEI Working Group: from semantic annotation to minimal digital editions

Antonio Rojas Castro
Pelagios
Published in
3 min readJun 19, 2019

It is with great pleasure that we announce our Recogito-TEI Working Group!

Recogito is a great web-based annotation tool developed by Pelagios Commons that enables annotation of geographic references in text, images and data through a user-friendly online platform. The most notable feature of Recogito is perhaps the ability to produce semantic data without working directly with formal languages, while at the same time allowing the user to export the annotations produced in different formats such as TEI-XML, RDF and GeoJson.

The Text Encoding Initiative (TEI) is a standard for the representation of textual material in digital form through the means of text encoding. TEI is used for digitally describing humanities texts using XML markup language.

Recogito interface. “La Argentina manuscrita” by Ruy Diaz de Guzman (1612, Buenos Aires).

At present Recogito supports import of plaintext (.txt extension) as well as TEI/XML encoded text (.xml extension). However, text and annotations are transformed into TEI only partially. Many users of Recogito and Pelagios Commons tools have found the transition between the annotated text in Recogito and the final TEI result very useful for developing Digital Scholarly Editions (DSE). Our WG aims to improve TEI support in Recogito, with a specific focus on integration issues, in order to make the tool more useful for the creation of digital scholarly editions. It will start by looking into ways to improve the customization of TEI documents in Recogito’s TEI annotation environment, which uses CETEIcean (developed by one of the members of this Working Group, Hugh Cayless, with Raffaele Viglianti). It will also investigate different options for exporting the annotated TEI with standoff markup and publishing with a pre-transformed TEI document. Lastly, it will attempt to improve the seamless integration with other open source options for publishing, such as Ed/Jekyll. As a first case study, it will focus on the Recogito TEI of a specific corpus of Early or Colonial Latin American texts as primary sources that were part of the Pelagios Resource Development Grant 2017, Pelagios al Sur, and the geographical data from the Pelagios Resource Development Grant 2018, Latam.

Map of “La Argentina manuscrita” by Ruy Diaz de Guzman (1612, Buenos Aires)

Our WG is a global and multilingual one, that brings together researchers and programmers from Argentina, Spain, Portugal, USA and Austria.

Our Goals

· Establish a seamless pipeline for Recogito users’ texts — using them to add LOD via the Recogito annotation platform, followed by publication in TEI

· Extend Pelagios methodologies into the area of DSE

· Promote the adoption of best practices for using Recogito with TEI files for producing valid and well-formed TEI files

· Explore the synergies between two popular digital methods for the Humanities and Social Sciences: semantic and geo-annotation and digital editions

· Explore methods for “minimally” publishing a small-scale TEI digital scholarly edition online with two solutions: Ed/Jekyll in GitHub Pages or CETEIcean solutions

· Exploit the value of intuitive, user-friendly, collaborative, community-driven and open source tools and resources such as Recogito, CETEIcean and/or Ed/Jekyll

Main outputs

· Publish three Reports on Pelagios Commons blog

· Teach a Workshop for Undergraduate and Postgraduate students in Argentina (October)

· Develop a DSE of Pelagios al Sur texts (2017 RDG) as beta test from Recogito to TEI

· Present a poster at the TEI 2019 Graz conference (Austria, September)

· Write a proposal to present a tutorial for presenting at the Programming Historian in English and Spanish

· Write a proposal to present a tutorial at Digital Modern Languages Tutorial Writing Sprint (London, July, 2019)

Members

· Hugh Cayless (Duke University, USA)

· Gimena del Rio (IIBICRIT, CONICET, Argentina)

· Nidia Hernández (HD CAICYT Lab, CONICET, Argentina)

· Romina De León (HD CAICYT Lab, CONICET, Argentina)

· Gustavo Fernández Riva (European Time Machine Project, Portugal)

· Susanna Allés (University of Miami, USA)

· Alex Gil (Columbia University, USA)

· Rainer Simon (AIT, Austria)

· Antonio Rojas Castro (BBAW, Germany)

We will keep the Pelagios community updated about the results of the workshop in a second blog post!

--

--

Antonio Rojas Castro
Pelagios

wissenschaftler-Mitarbeiter BBWA #humanidadesdigitales #digitalhumanities