Transkribus & Magazines: Transkribus’ Transcription & Recognition Platform (TRP) as Social Machine: Raising the Bar for Crowdsourcing Citizen Science

Jim Salmons
FactMiners’ Musings
5 min readJun 10, 2015

This series of Medium stories tells how my wife and #CitizenScientist partner Timlynn Babitsky and I found Kindred Spirits by exploring the vibrant communities of Citizen Scientists and Citizen Historians having “serious fun” as part of cutting edge #DigitalHumanities applied research projects doing transcription and recognition of Historic Handwritten Text Documents.

  1. Transkribus & Magazines — #SmartData in the #DigitalHumanities
  2. A Bit of Context — About FactMiners & The Softalk Apple Project
  3. Finding Kindred Spirits among Explorers of Our Cultural Fingerprint Before Print
  4. My Wish List for a ‘Magazine-Friendly’ Edition of Transkribus
  5. Transkribus’ Transcription & Recognition Platform (TRP) as Social Machine: Raising the Bar for Crowdsourcing Citizen Science

When a tweet from @tranScriptorium alerted me to the Transkribus project, my first reaction upon initial exploration was the utter thrill of seeing a well-developed tool that could be tweaked for our use to accelerate development of our proof-of-concept/MVP of the FactMiners’ social game platform for eResearch and visitor engagement by Libraries, Archives, and Museums (LAMs). Little did I know that the Transkribus team would be “killing me softly with their song” with a vision of far greater scope for their project; a sustainable community ecosystem vision they call the Transkribus Transcription & Recognition Platform (TRP).

In this somewhat sterile block diagram, the Transkribus team is envisioning a thriving ecosystem where “win-win” collaborations contribute to an “innovation network effect.” Subject matter experts in the Humanities work with Computer Scientists to advance #HTR technologies while LAMs (Libraries, Archives, and Museums) open new avenues of visitor engagement and community service. In the forthcoming paper in which this TRP platform diagram appears, I believe the authors significantly under-estimate the diversity and importance of the activity which will form around the Public interface.

The current Transkribus website is very good at identifying who might be interested in using this application and associated eResearch service. And it does a good job at helping prospective users to get started using the application and associated platform. But the site is very short on information about the people involved in this brilliant contribution to the tranScriptorium project. But a bit of motivated sleuthing led me to a pre-print of a forthcoming article that sent chills down my spine as I read it.

The article has the expressive title of “Handwritten Text Recognition (HTR) of Historical Documents as a Shared Task for Archivists, Computer Scientists and Humanities Scholars. The Model of a Transcription & Recognition Platform (TRP)” and it can be found as a 12-page PDF on Academia.edu. Not only did this article provide a comprehensive backgrounder on the Transkribus project and its platform, but I was equally thrilled to identify three people associated with the project. They are Guenter Muehlberger, Sebastian Colutto, and Philip Kahle, all associated with the Department for German Language and Literature at the University of Innsbruck, Austria.

What struck me most about these researchers’ vision — beyond the “nitty gritty” of the technology platform they are developing — was the breadth and depth of the ecosystem model they see energizing and sustaining the Transkribus Transcription & Recognition Platform (TRP). When I saw ‘Figure 1’ (reproduced above) in their forthcoming paper, I immediately recognized what my soulmate wife and project partner, Timlynn Babitsky, and I called an Entrepreneurial Community Ecosystem. In the late 90's and into the first years of the new Millennium, we worked on this ecosystem model as Sohodojo. A self-funded applied R&D lab, we nurtured and supported decentralized and distributed microenterprise and small business networks as a means of rural and distressed urban community development.

Social Machines are another manifestation of community ecosystems. Inspired by the ideas of Sir Tim Berners-Lee, SOCIAM is a U.K.-based research consortium exploring the theory and practice of Social Machines — “Social Machines are a characterization of technology-enabled social systems, seen as computational entities governed by both computational and social processes.” In the presentation here, Oxford’s David De Roure evokes the Social Machine aspects of the emerging eResearch domain. The Transkribus TRP is a Social Machine with Humanities Scholars and Computer Scientists functioning as Network Enablers of the “win-win” relationship between Archives and the Public.

The scale of widely available social networks and the technology infrastructure to fully realize entrepreneurial community ecosystems simply were not available 15–20 years ago when we were doing Sohodojo. Then we both had Life Interrupted with our cancer battles. As we’re re-engaged in our #PayItForward Bonus Rounds, Timlynn and I are doing our best to be a “living example” of Portfolio Life in an Entrepreneurial Community Ecosystem.

We had an opportunity to revisit our Entrepreneurial Community Ecosystem ideas during the recent Crowd Consortium #crowdcon conference. The U.S.-based Crowd Consortium is a national organization supporting research and deployment of crowdsourcing for cultural heritage institutions. While we whole-heartedly support and welcome the contributions of crowdsourcing project participants who can afford to volunteer their time and energy, we did our best to inject the idea that “free labor” is a too-low bar to set for the social value of crowdsourcing models by cultural heritage organizations.

Citizen Science/History Projects as 21st Century Job Creators & Skill Developers

When we face the fact that ageism, outsourcing, and automation will inevitably displace a significant portion of workers in today’s “good jobs,” it becomes easy to envision how Citizen Science and Citizen History projects will serve an important role in both new skill development and establishing collaborative social network relations essential to 21st Century “Job Creation.”

Passion-driven Personal Learning Networks — a natural component of successful crowdsourcing Citizen Science ecosystems — have a vital role to play in people’s lives as ever-faster change transforms our daily personal and social lives. Here’s an“infographic” we tweeted as part of the #crowdcon conversation suggesting we “shoot higher” than the “free labor” niche for crowdsourcing in #DigitalHumanities projects.

It is a relatively simple model-alignment exercise to see that the Transkribus Transcription & Recognition Platform leverages the Network Enabler roles of Humanities Scholars and Computer Scientists to support the “free market exchange” between Archives and the Public.

The Transkribus TRP as Entrepreneurial Community Ecosystem: In this re-configuration of the Transkribus Transcription & Recognition Platform (TRP), I have explicitly shown the participation of the Public, or the “crowd,” as Citizen Scientists and Citizen Historians contributing to the collective effort to transcribe historic handwritten documents. By including the Public as Citizen Scientist/Historian, we recognize the widespread use of crowdsourcing within the design for many, if not most, historic handwritten document recognition projects. While the pieces of the system are largely the same from the original block diagram from the upcoming Transkribus TRP paper, this version of the TRP system diagram recognizes TRP-based projects as an “exchange market” where “win-win” relationships energize community participation — each contributor gaining something of value in exchange for what they put in. In particular, this rendering of the TRP emphasizes an expanded role of the Public (AKA Individuals) as both the “World of Organizations” and the “World of the Individual” are shown to have Producer and Consumer participants in the exchange marketplace. While Digital Humanities Scholars and Computer Scientists get the benefit of a “virtuous circle” of participation in the market to better their professional research interests, the big winners are the Cultural Institutions and the Citizen Scientists/Historians. Cultural Institutions get innovative Visitor Engagement value in both the interface with the Public as Citizen Scientists/Historians by supplying the digital collection to be transcribed. In addition, the result of this “tier 1” community engagement is to open up a “tier 2” benefit as the TRP project provides new search and related object discovery methods for the general visiting public. But most importantly, the Citizen Scientists/Historians benefit from developing new Knowledge, Skills, and ‘Elastic Networks’ of trusted collaborators that will be needed to create “Job” opportunities in the emerging world of “post-labor” employment.

Of course #DigitalHumanities researchers and their collaborating Computer Scientists don’t have to embrace this “extra-curricular” dimension of crowdsourcing models for cultural heritage projects. But somehow I think it would be energizing to know that academics’ formerly esoteric activity in the hallowed halls of research are now on the front-lines of experiments in what “jobs” and “human work” will look like in the 21st Century.

So if all goes well moving forward, and we establish a good collaboration between FactMiners and the Transkribus project, it will be exciting “in the small” to jump-start our FactMiners fact-mining platform development through adoption of the Transkribus client application. But the really exciting thing will be if we get to help the Transkribus team to evolve the “sustainable virtuous” ecosystem they so beautifully described as a Transcription & Recognition Platform — a Social Machine for Job Creation & Skill Development in the 21st Century!

Congratulations Guenter, Sebastian, Philip, and your unnamed colleagues who have created an incredible technology platform as part of your wonderful vision for the Social Machine ecosystem that will empower a new level of historic handwritten text transcription and recognition. Timlynn and I look forward to an opportunity to get to know you and your project better.

Happy-Healthy Vibes,
-: Jim Salmons & Timlynn Babitsky :-
FactMiners.org and The Softalk Apple Project
At Publication: Cedar Rapids, Iowa USA
Currently: Broomfield, Colorado USA

--

--

Jim Salmons
FactMiners’ Musings

I am a #CitizenScientist doing #DigitalHumanities & #MachineLearning research via FactMiners & The Softalk Apple Project. Medium is my #OpenAccess channel.