#EuropeanaTech: Centralize, Decentralize, and Sailing on in the Same Boat
A curated tweet-narrative on the coolest tech event in the GLAM world
Twitter is king in the GLAM world, and despite the increasing interference of the proprietary algorithm in the timeline — which makes us think that a mass-migration to a non-commercial alternative like #Mastodon is an increasingly relevant option — , an event like #EuropeanaTech presents a lively parallel dialogue on the bird platform. Here goes an experimental curation of tweets recorded during the event-in-a-boat, which inspired some thoughts I share with you below.
The opening conference was by George Oates (@ukglo), the designer responsible for the original memorable interfaces of the Flickr service in the last decade, and the protagonist of Flickr Commons (2008), which attracted important collections from institutions such as the Library of Congress (US), the British Library (UK) and NASA. Afterwards, her time at the Internet Archive is marked by the exploration on new forms of content discovery, and emphasizes its nonconformity with the centrality of search in the scope of the research in “digital collections.
Without the search, what am I forced to do? #Spelunking*
(*) Spelunking: “the recreational pastime of exploring wild (generally non-commercial) cave systems. In contrast, speleology is the scientific study of caves and the cave environment” (Wikipedia).
Spelunking can be the use of Facets, sub-facets, diagrams (eg, an acquisition chart), metadata near the digital object (thing, place, person). By making an allusion to the obscure universe of caving, George managed to sow one of the event’s main memes, making many of us reflect on the limitations of the ubiquitous search field, and yearn to use the SPELUNKERS tools and wear their team jersey. Liam Wyatt (@Wittylama) from GLAMWiki, quoting Harry Verwayen (@Hverwayen) “without context”: “All these things that connect all these things that allow this spelunking to happen”.
The second lecturer was Ruben Verborgh (@RubenVerborgh), from Ghent University, speaking about the role of aggregation in a decentralized web. Ruben drew attention to an important feature of the GLAM community gathered in #EuropeanaTech:
“Experts and professionals involved with cultural heritage have a strong sense of ownership of their data . This is an important differential over society at large, which does not seem concerned about having its data appropriated by the large Internet corporations.”
Ruben’s talk emphasizes the importance of aggregators in promoting network flows, and the crucial role of ‘nodes’ as sources of authority. The challenge of integrating the network can be dealt with through a sound governance model, and the development of consensuses (protocols) that must be addressed in layers: “we can have up to 100% consensus on a small dataset, 80% on a larger dataset, 5% on several small datasets, and we have yet to build other consensuses on vocabularies, data formats, and interfaces.” Ruben emphasizes that at this time the ‘identity’ of the various actors in the network should give room for a greater emphasis on the ‘role’ that each one plays in the ecosystem. From what I could understand, the actor should not identify as an aggregator, which in a fixed conception would be the centralizer, but be ready to fulfill the role of aggregation whenever the situation so demands.
Presenting himself as Getty Trust,’s (Semantic) Information Architect, Rob Sanderson (@azaroth42) launches the concept of “LOUD”, or “Usable LOD”. It defines usability as “the degree to which (one thing) can be used by specific actors” — that is, the degree of usability will vary according to the audience we are referring to. Therefore, who defines usability is the specific Audience, which in the case of LOD, we’re talking about developers.
For your LOD to be LOUD, Sanderson emphasizes the need to provide useful information to developers, about what a name identifies when it is searched, and also to use open standards such as RDF, SPARQL, etc. Still, Sanderson’s comment on “The problem with RDF and nuclear power” did not go unnoticed ;)).
“In LOD, the ontology determines the API, and therefore there is a layer to be worked on so that we are dealing with this Audience (developers) in their own terms”
Mia Ridge, a must-follow on twitter, coordinates the digital curation sector at the British Library, and was in charge of presenting the panel of initiatives in #crowdsourcing, user generated content and institutional transcription of data. She started the panel drawing attention to the fact that “crowdsourcing isn’t about putting digitization assistants out of a job”.
The panel made it clear why #IIIF — International Image Interoperability Framework — became a hot issue in EuropeanaTech 2018. The combination of IIIF with Web Annotations (W3C standard) allows some very interesting and easy to implement possibilities for collaborative interfaces — mapping the exact position of elements in the image, and featuring an easy and intuitive way to collect and organize annotations on that specific element. By adopting the standards, the community promotes a common path of development, and the result is a creative display of tools that can be shared among initiatives and institutions.
Paul McCann (@sankesolutions), Digital Research Projects Manager at the National Library of Wales, presented his ‘CrowdSourcing Platform’, which was originally implemented as a set of modules for the Omeka S collection management system. The solution allows the creation of custom crowdsourcing interface from any source material in #IIIF , and the selection of the information that the crowd should work upon and saving the results as W3C annotations.
Europeana has been working for some time with crowdsourcing, and have experimented some tools — even the ‘Omeka’ option — but chose to follow another path. Frank Drauschke is a partner and co-founder of Facts & Files (@FFHistorians), a private historical research institute based in Berlin which since 2011 has partnered with Europeana in public history crowdsourcing projects such as Europeana 1914–1918 and Europeana 1989. He presented the ‘Europeana Transcribe’, online competition — transcribathon.eu — which was underway in Greece, where 29 students were transcribing ancient documents in Greek.
Günter Mühlberger, from the Department of German Language and Literature at Innsbruck University, presented the Transkribus platform, which aims to provide a comprehensive set of services for academics, archives, libraries and family historians for the transcription, recognition and research of historical documents. Among the crowdsourcing initiatives presented, this one brings in the support of artificial intelligence (#AI) in the collaborative process of document qualification, and Günter highlights the tool’s possibility of performing the processes in a private collaborative environment. The Transkribus is produced and supported as part of the READ (Recognition and Enrichment of Archival Documents) project, and the goal is to produce a tool for transcribing text from images: a Handwritten Text Recognition (HTR) model.
Mia Ridge presented her current initiative at the British Library — the Playbills Collection. The project asks volunteers to identify and transcribe information about playbills (230,000 theatrical posters from the late 18th century and the entire 19th century) to improve the catalog records of each item and make this historical collection more accessible to all.
#Crowdsourcing: The idea is to rediscover the popular entertainment of the last 300 years by identifying names and performances on the posters of the old theaters in Britain, exploring IIIF and Web Annotation.
The platform used in the “Playbills” project was LibCrowds — a solution well documented by Alex Mendes, who works on Mia’s team at BL — , host of experimental crowdsourcing projects. The LibCrowds Viewer was developed for this initiative, taking advantage of the flexibility allowed by the IIIF APIs. Images and metadata already maintained by the British Library can be requested, combined with some additional configuration details, and used to generate crowdsourcing task sets. This means that we do not need additional data nor are we tied to specific metadata structures of the institution. In fact, the system could be used to generate crowdsourced annotations for any IIIF-compliant content.
It was interesting to notice how much of the talk on this panel revolved around the platforms developed for the engagement/collaboration process. We know that such features are not native to classical software for digital repositories, and it is interesting to see how specific projects are enabling the emergence of this potential in specific separated developments, that are somehow connected through open standards. Would this be the time for greater articulation among developers around those issues? Do the different institutional background (academia, public library, private initiative, aggregating institution) of these initiatives interfere — positively, negatively — in the collaboration of these actors? Would it be possible to establish a GLAM sector wide collaboration in the proccess of maintaining and developing common free software tools, in tune with a coordinated participation in the definition of open standards and protocols, and also legal frameworks?
The first day concludes with the feeling that there has been a significant renovation in the topics covered in relation to the previous edition of #EuropeanaTech. As Antoine Isaac (@antoine_isaac), Europeana’s R&D manager said,
“it’s amazing how much less we’re talking about data modeling.”
Good news is the strengthening role of information sources, institutions, projects and specialists in the field. Likewise, the great attention on re-decentralizing initiatives involving the development of new solutions and free applications is a sign of growing integration among the different sectors in the GLAM world.
On the morning of #EuropeanaTech’s second day, the floor was taken by Ben Vershbow´s (@subsublibrary), who recently joined the Wikimedia Foundation from the New York Public Library (NYPL), in a moment where the institution is repositioning itself in the GLAM world. Based on the strategic plan #Wikimedia2030 — which projects the role of the initiative in designing a global “information commons“ — Ben points out the purpose of the Wikimedia Foundation in the constitution of the Wikidata service in essential infrastructure of interconnection for the ecosystem of knowledge. Ben argues that the first generation of applications for Linked Open Data (LOD) have failed to scale, and have failed to demonstrate benefits to the various types of users — becoming academic exercises without much practical consequences.
“We have to break the corporate silos. Wikipedia is the only non-commercial web service among the Top5 in the network, which puts us in a position to break down the social, political and technical obstacles that prevent wider access to knowledge.”
The initiative is bold and meaningful, no wonder #wikidata has become the event main hashtag. But amid reflections on the benefits of decentralization, how to deal with the proposal of an (non-corporate) infrastructure to centralize structured data of global cultural heritage? It seems obvious to me that we need to move forward in the reflection on the shared governance of such common infrastructure, or in the alternative of some corresponding federated model.
Specifically, the wikidata proposal for GLAM institutions involves: (1) advancing the implementation of the already tested model of residence of the Wikipedia specialists in the institutions; (2) connect the wikidata with locally controlled vocabularies; (3) use the Wikidata authorities to describe collections of traditional and indigenous knowledge in their collections; (4) join efforts in the wiki citation project, which has gathered considerable volume of bibliographic data; as well as the shared use of tools and standards that promote qualified access to institutions’ collections, such as Scholia (web browsing tool), Rights Statements, and IIIF images). Undoubtedly, a robust and interesting strategy.
As a living illustration of the “wikimedian in residence” model in memory institutions, we had the presence of Jason Evans (@WIKI_NLW), the “National Wikimedian” of the National Library of Wales. In recent years Jason has worked as a resident and managed several “Edit-a-thon” projects to qualify Welsh Wikipedia content, and is a regular contributor to digital heritage conferences with a particular interest in Linked Open Data (LOD). His work demonstrates how relevant context information about collections, such as identifying networks of editors, printers and engravers, can emerge through the enrichment of data in Wikidata, and experimentation with applications such as Crotos make it possible to explore different views from the structured data.
As an event wearing the ‘tech’ tag in 2018, and talking about decentralization, EuropeanaTech could not fail to touch the #blockchain issue. Raivo Ruusalepp (@Raivo_Ruusalepp) from the National Library of Estonia presented possible scenarios for the use of the technology within cultural heritage institutions. Assuming that trust is a determining value for the field, he sees opportunities such as “better deals with insurers if the exchange of collection items between institutions can be registered in the blockchain”.
Raivo drew attention to the great speculative movement in the sector, which could cause the emergence of commercial proposals without the proper participation of GLAM specialists. In this sense, it is important that institutions have the opportunity to experiment and develop their own use cases, and thus point the direction to the evolution of the technology. “We have not lost the boat yet, but it is important that we are reflecting on these possibilities.”
Another important speech on the second day was from Enno Meijers (@ennomeijers), who works at the National Library of the Netherlands, presenting the implementation of “a distributed network of information on digital heritage”. Enno highlighted the current trend in content discovery infrastructure towards migrating from the traditional concept of aggregation to a more distributed model. The initiative to create the network was made possible by the Dutch’s government ‘Digital Heritage Network’, a consumer dream in terms of public policy for all those working in the sector, which establishes technical and institutional sustainability for integrated digital projects in cultural heritage.
“We have some very good portals, but the fact is that each user is a portal.”
Enno talked about the many challenges in building the distributed network, such as the use of different definitions, metadata and forms of description by the original sources of information. There are actually several layers of functions and roles that need to be in tune, but according to Enno everything leads us to consider that the empowerment of the original data sources is the way to go.
EuropeanaTech’s final panel presented a debate on the dialectics of Centralization vs. Decentralization in the digital projects of the GLAM world, bringing back to the stage the main speakers in the event. To highlight the polarization between the two perspectives seemed an appropriate strategy to activate the debate. After all, a certain tension was evident in an event promoted by an aggregating institution (centralizing logic), which managed to emphasize in its programme the emergence of decentralization in the field, and the panel did yield some interesting takes (below). But the panelists recurrently commented that, depending on the scenario, evaluations on decentralizing options will vary.
Enno Meijers (@ennomeijers): “The essence of decentralization is the reflection that each actor can make in relation to their specific role in the ecosystem we are creating.”
Valentine Charles (@valentinec89): “We (Europeana) are a network, we aspire decentralization, but on the other hand we are pressured to deliver qualified services, which depend on some centralization. What can we do to evolve?”
Jill Cousins (@JilCos): “Change is not easy. Yes, there are the timelines that need to be met, as well as the resource delivery structure, which leaves few options beyond doing what is demanded.”
Partha Pratim Das (@atppd), National Digital Library of India: “We are working with a project for students in India, dealing with a multiverse of data sources, so we can only work with centralization at this time. I believe decentralization will come, eventually.”
Ruben Verborgh (@RubenVerborgh): “The bigger question is: what goals do we want to achieve? For each case we have to ask ourselves if we need centralization or decentralization.”
Herbert van de Sompel (@hvdsomp):”What can be achieved through decentralization has evolved significantly. Link exchanges with partners, we could do decentralized, but we still do centralized, which ends up weakening the sources of information.
“The academic community still bases its work on PMH (protocol for metadata harvesting), and imagines that Google Scholar will solve all its problems. It won´t.”
A good approach to understand the tension between centralization and decentralization in digital cultural heritage seems to be the view that each actor in the field will play different roles at different times, with different positions on the scale from a pole to the other. The ‘identity’ of the various actors in the network would give room for a greater emphasis on the ‘role’ that each one plays in the ecosystem, in its different relations with the other actors. A major advantage of debating this specific tension in EuropeanaTech2018 is that even when centralization is contemplated as an option — as in Wikidata’s role proposed by Wikimedia Foundation and / or Europeana’s own role as a regional GLAM aggregator — the references are non-commercial initiatives that embrace open access to knowledge as a principle. That is something very close to the civic spirit that always guided the performance of archives, libraries, museums, and public institutions of memory in general.
At the time of closure, someone in the audience commented on the figurative meaning of hosting the GLAM-TECH community event on a ship, and how we could be tempted to “close the doors, and sail away”, being able to keep debating, collaborating, developing this ecosystem together. What seems to unite this group, in fact, is the strong connection that its members present in relation to their data, the information with which they work, and the importance given to the cumulative process of making it better. The experience in EuropeanaTech2018 gave me the impression that the passion that moves this community can contribute to the development of a more contemporary approach to memory as a social practice, especially in terms of new modes of management for the digital cultural heritage in the information age. Sailing on through the perspectives of centralization vs decentralization, feels good to find ourselves in the same boat.
Related Links: EuropeanaTech, why it mattered to me (video)
José Murilo is Coordinator of Information Architecture in Museums, at the Brazilian Institute for Museums — Ibram. He participated in EuropeanaTech due to an invitation from Europeana Foundation.