Converting a scanned document into a compressed, searchable PDF with redactions

It isn’t as easy as I thought. But in the end, I got 35 megabytes down to under a megabyte—with no loss in quality.

Trey Harris
13 min readMay 23, 2018

Note: This article is from 2018. As of the moment I write this addendum (21 Sep 2022), the instructions still work. But they’re a bit long in the tooth and some of the tools below may be abandonware. I’ll check again periodically and if the instructions are obsoleted, I’ll made a note (and, hopefully, provide alternatives).

I recently received an 8-page document (by snail mail, on paper) from the Department of Transportation. I wanted to share it with a few different people and eventually add it to a new essay I’m writing. I figured, I’ve got a printer/scanner with an automatic document feeder, how hard could it be? Turns out, harder than I thought.

If you just want to know how to accomplish this yourself, jump down to the bottom of this post, where you’ll find step-by-step instructions.

The requirements

I knew I wanted a final PDF that

  • had searchable text, but in an invisible layer so the original document could be seen;
  • had a small data size so it wouldn’t run into upload or sharing size limits;
  • looked good; and
  • had redactions that were true —i.e., that couldn’t be thwarted…

--

--

Trey Harris

Yet Another Geek. New York City. Formerly at Google, Amazon, Bloomberg. Gay. He/him.