“An Open Scanner”

a Creative Commons Venezuela Chapter proposal — Part 1

--

“An Open Scanner” identity, created for Creative Commons Venezuela — by Estefanía Sánchez Pineda is licensed under CC BY 4.0.

In 2020, and while immersed in what seemed to be a defining event for the entire planet, at the Creative Commons Venezuelan Chapter arrived the news that the Community Activities Fund has accepted our proposal of creating a version of an Open Source book scanner. Right away, we started to execute the plan and — more important — build a task force that would explore the feasibility of finding, creating and ensembling such a “hardware” in Venezuela.

Our first approach was to go for young members of the team that happen to be involved in universities. It was evident that one of the first target audience for such a tool will be university libraries and other academic institutions. They tend to be the home of multiple volumes of countless subjects and expertise areas; many of those only exist in physical form, so prone to be lost forever.

Also, universities tend to have spaces and resources that can be borrowed to perform this project. At least, this was the hypothesis. Soon after, we understood that this seemed to be a more significant challenge than was anticipated.

The Venezuelan case & the team

While in Venezuela, several of us were looking for possible resources and materials to use; others outside the country studied previous several Open Access book scanners projects. Faithful to the principles of sharing and re-use, the original idea was to replicate an already existing Open Access project!

This replication will speed up the process: find the proper blueprints and code, and we will be ready to buy the material, grab some tools and start to “cut and paste”, just like others have successfully done before us.

Unfortunately, it was not the case. During extensive conversations, lots of searches online and asking other parties, we understood that one of the reasons why such open access book/paper scanner projects are challenging to reproduce is that the rely on software that is not easy to find, use or maintain.

To be clear, there are beautiful and well-documented hardware book-scanners projects. They have been evolved to the point of having clever solutions almost at the level of commercial solutions. But, the software is another story.

This current intimate relation between software and hardware limit modular design & construction and also cost more.

Another important aspect we learned while looking for suitable projects and materials, including the cameras, was that the hardware, i.e. the scanner itself, was designed to fit a particular way to capture and trait the pictures. The core of several of the physical scanner proposals is based on matching no-so-flexible software and capture post-processing. This current intimate relation between software and hardware limit modular design & construction and also cost more.

The “An Open Scanner”

This is why we start from scratch. And from the less “expensive” part: the software.

Nowadays, Machine Learning techniques and Computer Vision tools allow identifying characters and features in photographs as was impossible just a few years ago.

So, the new approach was: imagine you have the captures (i.e. photos) of the pages of a book already on your computer. Can we deploy an Open Source solution that allows the reconstruction of such a book, including Optical Character Recognition (OCR)?

Or better say, if we don't care how the pictures are taken, can we reconstruct the document and extract the information that contains?

Yes, the answer to that question is yes,… well… we needed it to be a “yes”. Why? because if that is the case, we can open an entirely new set of opportunities, but two of those are key:

A commodity photo camera in use when creating the first “An Open Scanner” — by Arturo Sánchez Pineda is licensed under CC BY 4.0.
  • We can let future users modify, develop or re-use the hardware that better fit their needs. Including the cameras: they don't need to be any specific couple any more.
  • The software can be developed independently of the hardware geometry and other characteristics, allowing a more generic product that can be delivered as an offline solution or a Software as a Service.

If you look it in this way, the opportunities to create a community and give the people the power to develop a sustainable and scalable business opportunity — inspired in Made with Creative Commons — was now a reality!

The opportunity to create a sustainable and scalable community was now a reality!

So, An Open Scanner or AOS is

  • A community-driven project for digitisation and computer-based accessibility of documents.
  • Open Access Hardware and Software that allows scanning almost any standard document, book, and manuscript at a low cost.
  • An effortless design to allow the creation of scanners that relies on Machine Learning and Computer Vision algorithms to manipulate and enhance the shots.
  • By design, it has the potential to create a community of “Operators” and “Editors” distributed worldwide.
  • Making it much easier to go from a physical piece of paper to a digital and enhanced document thanks to Cloud Computing.
  • It has a dedicated business model in the spirit of “Made with Creative Commons”.
The “An Open Scanner” logo
“An Open Scanner” logo donated by Estefanía Sánchez Pineda — CC GNC member. It’s licensed under CC BY 4.0.

AOS primary targets

  • Whoever wants to digitise a document or book
  • Small & medium educational institutions
  • Schools, universities, libraries
  • Public & private museums and other cultural associations
  • Traditional publishers and other media companies
  • Scientists and researchers that need mobility and flexibility to scan delicate documents

Also

  • Editors and others that need or want to enhance an already digitised book — this means that soon the software can be used to OCR books scanned when the tech was not ready yet!
  • Organisations that want to distribute the workload of a large number of documents processing among geographically separated personnel or volunteers.

And it is here where the community is created!

The software can be used to OCR books that were scanned when the tech was not ready yet!

A look down the version zero of the “An Open Scanner” hardware setup under test — by Arturo Sánchez Pineda is licensed under CC BY 4.0.

AOS communities

The Creative Commons Venezuela chapter is the first community we would like to develop this idea further. But in parallel, we are already looking for developers that can help build the software from the prototype created. But, coming back to the community, we can already imagine two particular tasks and assign a job to such tasks:

The Operators

This is the group of people that will use the hardware: they are responsible for the transportation, ensembling and taking the shots. They will also be the trainers of future users who will borrow (or rent) the “An Open Scanner” from the chapter — first — and later to any person or group that follows the idea and builds their scanner. An army of Operators will allow scanning multiple documents in places like libraries, archives,… to rescue information that only exists in that place, to give you an example. As you can see, cheap hardware becomes crucial for scaling and reaching as many places as possible in the developing and developed world.

The Editors

This is an interesting group. The Editors are the people that will get the images fro the Operators and do the corresponding processing: use the software to reconstruct the book or document, perform OCR and get a complete digitalisation.

And, if we take a minute to think about it, we will realise that in our current world, the Editors don't need to be physically close to the “An Open Scanner”. They don't need to be part of the scanning process itself.

So, you can distribute this task among other people. In this case, the Editor’s skills will be different and complementary to those of the Operators, creating an excellent ecosystem where books are scanned in one place and digitally reconstructed in other places… of the world. In a professional project, that kind of endeavour can be sources of employment in areas with low access to the internet (more on how that is possible in Part 2).

Is looks simple? Yes! that is the idea. We want it as simple, cheap and generic as possible — by Arturo Sánchez Pineda is licensed under CC BY 4.0.

So, where are we going now? After a proof-of-concept of the hardware and the software that was done during 2020, we are moving to create the right platform for collaborations:

  • Getting the software examples in good shape — standard code cleaning — and the repositories that others can try and join as developers or testers.
  • We already have a brand — or identity — and it will be used on the website that will host the blueprints and documentation.
  • In the future, the same website will be the place to post the stories of success and learning by the future community.
  • Getting inspiration from great projects like the https://scholar.archive.org/, and many others.

This is a very important project for us, and we are excited to move forward, together with the Open Access and Open Source communities and, of course, with Creative Commons.

Let's meet again in our next post. Thanks for reading!

--

--

Arturo Sánchez Pineda
Creative Commons: We Like to Share

PhD Researcher, SysAdmin & Educator — Work(ed) @ CERN / ICTP / INFN / LAPP / CNRS / ULA — https://www.linkedin.com/in/arturo-sanchez-pineda - @Arturo_RSP