Introducing Augraphy

proofconstruction
3 min readSep 29, 2021

Our team has been hard at work the past few months, and we’re excited to announce the version 5 release of Augraphy, a library for generating document image training data for machine learning projects.

To stay competitive in today’s market, businesses are increasingly relying on data-driven approaches, and many are hitting the same roadblock: historically, all of their data was printed, and is trapped in paper format.

Getting the data back out and into digital form has traditionally required expensive, time-consuming, and error-prone transcription. To address this, many organizations are turning to machine learning solutions to keep costs down and improve accuracy as they accelerate their digital transformations.

Enter Augraphy

Plenty of excellent image augmentation libraries exist, but most are focused on general image transformations, like adding a blur effect or compression artifacts.

Augraphy specializes in generating images of visually realistic documents, with problems commonly encountered out in the world. To name a few:

  • the document wasn’t correctly aligned with the scanner bed, and uneven dark borders appeared on the copy
  • before copying, the document was folded, and a crease appeared in the output
  • the printer was running low on ink, and parts of the text are lighter than others
  • artifacts and significant noise were introduced when faxing the document
  • the document is an old crumpled receipt, its text fading with time

Augraphy is capable of reproducing many more effects than these (we’ve added a lot in this release!), and you can check them out now on the project page linked above. We’re also working on an article series about the different augmentations, so make sure to check back here for updates.

Edit: the first post is up, scroll to the bottom for the link!

In Action

Here’s a visual example of the power of Augraphy: after running the default Augraphy pipeline over the source image, we receive some new images that look like our source, printed on different paper material and by different machines with common problems.

First, the source image, a sample invoice letter from Apple Pages:

Here we have a print onto something like receipt or triplicate paper, with areas of low ink, lines that should be filled in, and fuzzy, lower-resolution text:

Augraphy can also “print” on entirely different surfaces, like this hemp-like texture:

Using It

If you want to leverage the power of Augraphy in your own projects, check out the project on our GitHub page, or download it from PyPI right now with pip install augraphy.

We’re happy to answer any questions you may have and assist you with integrating Augraphy into your work. If you have any trouble, please open a GitHub Issue and tell us about it.

Want to contribute? Check out this post where I explained some of the project structure and what we’re looking for.

More About Augmentations

As the team posts augmentation-specific stuff, I’ll update the list here:

  1. Markup

--

--