Working with PDF annotations using C#: Redaction Annotation

Published in

asposepdf

2 min readSep 24, 2018

“assorted printer paper lot” by Annie Spratt on Unsplash

There are a number of business documents with confidential information. Sometimes we need to show part or whole document but without sensitive content or private information.
Since version 1.7 of PDF Standard, we can use the redaction process for PDF documents. When you use redaction in PDF, you remove information from your document permanently but keep document’s formatting.

The intent of redaction annotations is to make the following steps:

Content identification. A user applies annotations that indicate the fragments or content areas that need to be deleted. Before the next step, the user can see, move and redefine these annotations.
Content removal. The user instructs the viewer application to apply the redact annotations, after which the content in the area specified by the redact annotations is removed. In the removed content’s place, some marking appears to indicate the area has been redacted. Also, the redact annotations are removed from the PDF document.

In both steps, we deal with the RedactionAnnotation class.

Content identification

There are several ways to mark content for editing:

define a custom area on a specific page (for example, a 200x100 rectangle in the upper left corner);
find a specific text content (for example, phone numbers, e-mail, social identifiers);
find images or tables.

The first way is pretty straightforward because we need to define a rectangle and create an annotation object only. As described above in this example we define a custom area in the upper left corner of the page.

Let’s look at another task. In the following task, we will search phone numbers in the document and mark last 4 digits with RedactionAnnotation. To do this, we will use TextFragmentAbsorber with regular expression as the search string. Additionally, we will set up error logging.

For each searched text fragment we will change coordinates so that only last 4 digits can be marked.

The algorithm for tables and images is mostly the same as for text fragments except that we will use TableAbsorber and ImagePlacementAbsorber respectively.

Content removal

Finally, we can remove sensitive content. It’s easy, we just call Redact() method for each annotation.

Working with PDF annotations using C#: Redaction Annotation

Content identification

Content removal

Written by Andriy Andruhovski