Working with PDF annotations using C#: Highlight Annotation

Andriy Andruhovski
asposepdf
Published in
3 min readSep 16, 2018
“white labeled book” by Russ Ward on Unsplash

The Highlighting Annotation is one of the frequently used tools for PDF documents. It can be used in the different systems to display searched fragments, to mark important parts in the documents etc. Many PDF tools allow the user to highlight the text manually, but in this post, we will talk how to do that with C#.

Using the Aspose.PDF, we can apply two approaches: operate with the Document Object Model or create annotations using the PdfContentEditor. The last way described in my previous post. Now we will use the Document Object Model.

According to PDF Standard, a page object has an optional Annots entry, that holds an annotation collection. In Aspose.PDF each annotation presented as classes derived from an Annotation class.

Highlighting of text using DOM

When using DOM, we must adhere to the following rules:

  • Search the text (using TextFrargmentAbsorber) or get the coordinates of the text fragments in another way;
  • Get instances of the particular page and its annotation collection;
  • Create a new instance of the HighlightAnnotation and fill required fields;
  • Add new annotation to the collection and save the document.

Let us take the following example. Assume we have the document we need to highlight all occurrences of the phrase “Adobe Acrobat Reader”. In this example, we deliberately omit setting up any other properties.

First, we will search all occurrences using TextFragmentAbsorber. Because our searched words can be placed in the different lines, we will use regular expression @”Adobe\W+Acrobat\W+Reader”. So, we can get 1, 2 or 3 text segments as result.

Further, we will create the highlight annotation based on the coordinates of the text segments. According to PDF Standard, we must fill the QuadPoints array with rectangle coordinates of the text segments if a number of the text segments more than 1 otherwise fill the Rect property only.

Editing existing highlights

In the previous example, we create the simple highlight annotation with the default values of many properties. Now we learn how to change them. The following snippet shows how to change properties in the existing annotations:

  • Name — an annotation name, a text string uniquely identifying it among all the annotations on its page;
  • Title — a text label that shall be displayed in the title bar of the annotation’s pop-up window when open and active. Usually, this entry used to identify the user who added the annotation;
  • Subject — a short description of the subject being addressed by the annotation;
  • Contents — this property speaks for itself;
  • Opacity — a value from 0.0 to 1.0;
  • Flags — a set of flags specifying various characteristics of the annotation;
  • Modified — a moment of the last modification.

The full list of properties you can find here. All these properties can be set when creating the annotation.

Removing existing annotation

Removing existing annotations is pretty straightforward — just filter the Highlight Annotations in the collection and call AnnotationCollection.Delete for each filtered item.

Extracting a highlighted text

The last operation we’ll look at will be extracting a highlighted text. This operation is simple too because we need to only one method.

If you need to see text fragments instead of whole text, then you can use GetMarkedTextFragments method and its Text property.

Conclusion

This post shows how we can use Highlight Annotations in the most popular cases. It’s not a full list of operations with these annotations, so you feel free to comment and we can revert to this issue again in the future posts.

--

--