FactMiners’ Crowdsourcing at Zooniverse

Ground Truth, the Internet Archive, and Softalk Magazine

This “mini-story” serves two purposes:

  • as a short follow-up to my FactMiners’ Musings articles on the PRImA Research Center, its Aletheia “Ground-Truth” tool, and Aletheia’s potential use at the Internet Archive as exemplified by FactMiners’ work on the Softalk magazine collection, and
  • as an (off-site) About page for our new Zooniverse-based crowdsourcing Citizen History project, “Teach Robots to Read Magazines: Softalk Edition.”
When you do TOC-Apart activity — we can’t honestly call these games yet — as part of the “Teach Robots to Read Magazines” community on Zooniverse, you are helping to “map” the structure of the Table of Contents (TOC) pages of Softalk magazine. The data from crowdsourced TOC-Apart classifications and measurements will be used to create “seed” pages for “ground-truth editions” of these TOC pages. These ground-truth pages then serve as the all-important “training material” needed by “robots” (AKA agent-based smart programs) that need to understand how to read and understand magazines.

To make quick work of the prior articles’ follow-up, the bottom-line is this:

As students in this summer’s #HILT2015 “Crowdsourcing Cultural Heritage” course taught by Mia Ridge and Ben Brumfield, Timlynn and I had an in-class opportunity to begin prototyping what turned out to be an “MVP”-worthy — minimally viable product — crowdsourcing project to begin creating the “Ground Truth Edition” of the TOC (Table of Contents) pages of Softalk magazine! :-) This is our first real “data-generating” activity in building the FactMiners’ Fact Cloud “edition” of the full 48-issue Softalk magazine collection.
FactMiners.org is the proverbial “tip of the iceberg” in terms of the Big Picture of what the “FactMiners ecosystem” will ultimately look like. As an Open Source software stack, the FactMiners platform will provide a social-game platform for participating LAMs — Libraries, Archives, and Museums. We’ve described this “social machine” as an entrepreneurial community ecosystem. This ecosystem idea is based on our prior work as Sohodojo supporting decentralized and distributed small business supply chain systems for rural and distressed urban community and economic development. These ideas are beyond the scope of this About introduction. But it you are interested, you’ll find a nice summary of this context of our applied research interest in “Transkribus & Magazines: Transkribus’ Transcription & Recognition Platform (TRP) as Social Machine: Raising the Bar for Crowdsourcing Citizen Science.”

If you are at least somewhat familiar with FactMiners’ Citizen Science applied research agenda, we hope you will follow this link to our “Teach Robots to Read Magazines: Softalk Edition” project over on the Zooniverse platform. Having read a bit about what we are doing, we welcome your joining in to help do some proof-of-concept FactMiners’ “fact-mining”… starting by helping us better understand the visual language of magazine design! :-)

If you are a Zooniversian and have found your way here via the About link on our “Teach Robots to Read Magazines: Softalk Edition” project, thank you for your interest. And welcome to our combined Citizen Science and Citizen History projects! I’m Jim Salmons and my wife and project-partner is Timlynn Babitsky. We founded FactMiners.org and The Softalk Apple Project as part of our #PayItForward post-cancer Bonus Round activity.

Our new Zooniverse-based crowdsourcing mini-project is a “kickstarter” for FactMiners’ applied research developing the “Fact Cloud Edition” of the complete 48-issue Softalk magazine collection. The biggest challenge to our research — and to opening up the collections of the Internet Archive to our approach to “text as massively addressable object” — is finding an efficient way to do OCR, text recognition, within the logical document structures within the magazine. Virtually all bulk cultural heritage bulk digitization workflows use OCR engines that simply create an undifferentiated “text soup” that hides the all-important structure of a magazine.

We’ll polish this page with additional information as our projects evolve. In the meantime, we want to give you a graduated series of links to better understand our projects and how our Zooniverse crowdsourcing project fits into our applied #DigitalHumanities, #SmartData, and #CognitiveComputing research agenda:

Until we have a chance to create a “shorter and sweeter” About page, the above links should give you all that you need to know to better understand our POC (proof of concept) crowdsourcing project, Teach Robots to Read Magazines: Softalk Edition on Zooniverse.

Happy-Healthy Vibes,
-: Jim Salmons & Timlynn Babitsky :-
FactMiners.org & SoftalkApple.com