Webify PDF Files in AEM using Adobe Cloud APIs

KM Robin
Adobe Demo Blog
Published in
5 min readJul 13, 2020

Co-Author: Pankaj Singh

With the introduction of SDKs in Adobe’s Document Cloud Platform, connecting the magic to Adobe Experience Manager just feels natural.

Managing PDF files on web pages is not a new thing, but the process has never been so intuitive. Let’s take a look at the foundations by solving a sample problem statement (derived from recent conversations with a couple of Adobe customers).

Problem Statement:

An organization XYZ has recently discovered the need to focus on the content of type PDF from its internal divisions across geographies (most of them are using AEM for Content Management). These PDF files need to be put on the web and made searchable. About 25 percent of files need a manual summary and internal review before being published as webpages. XYZ would also like to analyze the user behavior in-depth, e.g. time spent by readers, devices used, building trends around search terms to personalize the visitor experience. Currently, they list PDFs as hyperlinks, which navigate users away from their portal.

Solution Approach:

Yes, there are various plugins in the market for embedding PDF files in web pages. Even HTML specification has tags like <object> and <embed>. But if it’s about PDF, who better than Adobe?

Key benefits of using Adobe’s DC View SDK are pixel-perfect rendering across devices, zero-code analytics, and a highly customizable UI with callbacks for user profiles. The best part is it’s easy to integrate.

Therefore, we’ll leverage Adobe DC View SDK inside AEM for the solution.

Technical Details:

You’ll need:

  • Adobe Document Cloud View SDK (It’s free, sign up for an Adobe ID)
  • Adobe Experience Manager
  • Adobe Analytics

Getting started:

Let’s divide the solution into four parts:

1/ Enhanced Side-by-Side Search (Extending AEM’s search capabilities to search for PDF files and browse them side-by-side).

2/ Bulk Page Creation (Using a bunch of PDFs, quickly Generate Pages).

3/ Page creation with Review Process (Capability for authors to extract summary out of PDF and get the page evaluated before publishing).

4/ PDF Analytics (Integrating Adobe Analytics for detailed PDF Insights).

Let’s create:

A. Two custom components: i) PDF Viewer ii) PDF Search

B. An AEM page template (with PDF Viewer component embedded)

C. Workflows for review process & bulk activity

D. Adobe Analytics Integration for PDF as per documentation

Implementation:

A. The custom components (AEM) : —

i) PDF Viewer:

This will render HTML similar to the markup in the View SDK documentation.

  • Create a new component with drag-drop support for PDFs and use the SDK script in ClientLibrary. For a rich authoring experience, make use of dialog for browsing a PDF along with options viz. Enable/Disable Print, Download, Full Screen, etc.
  • The previewFile function of View SDK controls these variables. Refer documentation for all menu options & controls.

Quick Note: Make sure to update Client ID (and Report Suite ID, required only for PDF Analytics). You may want to expose these as OSGi Configuration with Client ID as encrypted (or as Secret Config if using AEM Cloud Service).

ii). PDF Search:

We extended the OOTB Search Component (you can build a new one). The key is to divide this component into two linked containers: a) Search with its Results b) PDF Preview.

  • For a side-by-side Search & Preview experience, create a new component with two containers, e.g., a left <div> containing a Search Box and space for showing Search Results and the right <div> for previewing PDFs (re-use PDF Viewer component?).
Two containers.
Connecting the containers & loading PDFs for preview.

The component can support capabilities like a tag-based filter, Spellcheck, and Suggest for PDFs. Searching and indexing of content inside PDFs are already handled by Oak using Apache Tika Extraction Library.

B. Page Template (AEM) : —

  • Use the template editor to include the PDF Viewer component in the template body. This will be handy while using workflows to generate pages from PDF.

C. Workflows (AEM) : —

  • For content review and bulk tasks, let’s create two workflows: i) Bulk Page Creation ii). Request Page Creation.

i). Bulk Page Creation WF:

The workflow process step can leverage Page Creation API and the template created earlier.

Page articletPage = pageManager.create(parentPagePath, pageName, template, pageName);
// Parent Page, Template can be configurable as an OSGI Config or Workflow Arguments.
// Use PDF's metadata like Title, Description etc. for the Page.
Optionally, add a button in the UI. On click, control where pages are created and trigger the workflow.

ii). Request Page Creation WF:

Multi-step workflow for Page Creation (using PDF File). AEM Projects module can further simplify Task Management.
  • Content Author can create the page by extracting a summary from PDF and using the AEM page template. (Alternatively, use a generic page with query-param for side-by-side summary creation by reusing the concept of PDF embed & Page creation service.)
In AEM Inbox, a new button can be added for a seamless redirect to the summary creation page.
Using Embedded PDF for seamless summary (metadata) creation within the web interface.

D. The Analytics-PDF integration (Adobe Analytics) : —

You need to specify the Analytics Report Suite ID in your PDF Viewer component. That’s it, now follow the simple steps outlined in the documentation to setup AA processing rules.

Detailed Breakdown of User Interaction with the PDF content.
Detailed Breakdown of User Interaction with the PDF content.

What’s Next:

At the time of writing this post, a standard core component PDF Viewer is also in works. Feel free to contribute to the AEM Core Components project, it’s open-source.

While we used some basic features of View SDK in this article, its sibling Services SDK is rather more powerful with APIs for creating, combining, exporting PDFs, Text Recognition (OCR), and much more. These can lead to many interesting use-cases when combined with AEM, so try them out!

Resources:

This article was originally published in Adobe Tech: https://bit.ly/AEMandPDF

--

--

KM Robin
Adobe Demo Blog

Working at Adobe to contribute to a better tomorrow.