Steps to Merge XML Data in Existing PDF Forms

This article provides the steps to programmatically merge XML data to a existing offline PDF form.

Suvadeep Dey
May 4 · 4 min read
Adobe Experience Manager

The business use case

An organization wants to provide a mechanism to its customers to dynamically populate their demographic data from some other 3rd party system to an offline PDF form, thus saving their customers a lot of time, providing a better experience and automating the form fill-up process.

Flow of Data merge into Offline PDF form.

Prerequisites

  1. AEM 6.4+
  2. Forms Add-on
  3. AEM Maven project should have the following dependencies (artifactId)
adobe-lc-forms-bedrock-connector
adobe-aemds-core-docmanager
adobe-aemfd-docassurance
com.adobe.aemfd

4. Acrobat PDF license to edit the PDF files

Solution

The solution is to use AEM Forms technology by leveraging the Forms API to automatically merge the XML with the existing PDF forms.

  1. Prepare the offline PDF for merge functionality to verify whether the field names in the PDF meet the XML specification with respect to XML tag naming. The field names in the PDF correspond to the XML tag names. This can be done by the following steps —

a) Open the PDF file in Adobe PDF reader.One should have the license to also edit the PDF

b) Select the option Edit the PDF from the right side pane in the PDF reader and this will open the file for editing.

c) Once the file is open for editing, select File menu -> Create -> Create Form -> click on Start button

d) This will make all the fields of the PDF visible on the right side pane as a tree structure where one can edit the actual names of the fields and make the field names to meet the XML tag naming specification e.g. field names should not have a ‘ ’ in their names because XML tags should not have a space in their name. In case the field names are found to have ‘ ‘ in their names, a good practice would be to just replace them with a character such as ‘_’ or delete the ‘ ‘ altogether. The field During the lease could be converted to During_the_lease or DuringTheLease, etc.

e) Once all the fields are taken care of, the PDF needs to be saved from File -> Save As.

2. The next step is to generate a sample XML data that can be used as an input for the merge functionality. This needs to be done in the following way —

a) Open the PDF in Forms Designer ensuring that the option “Creating an Interactive Form with Fixed pages” is selected.

b) Clicking on Finish will load the form in the Forms Designer with the individual fields of the form visible on the left pane of the tool

c) The next step is to generate Preview Data for the form so that we have the sample XML data that will be required for further testing. This can be done by clicking on the File menu -> Form Properties -> Preview.

d) Once on the preview panel, specify a file on the local filesystem where you want the file to be generated and click on the Generate Preview Data button. This will bring up a dialog box where it will show you the form elements again and upon confirming will create the preview data with dummy text inside valid XML tags.

e) This dummy text will be present in a single hierarchy. A small snippet follows

<topmostSubform>
<Name>Ego ille</Name>
<No>Si manu vacuas</No>
<Street>Apros tres et quidem</Street>
<Apt>Mirum est</Apt>
<Municipality>Licebit auctore</Municipality>
<Postalcode>Proinde</Postalcode>
<TelephoneNo>Am undique</TelephoneNo>
<OtherTelephoneNocellphone>Ad retia sedebam</OtherTelephoneNocellphone>
</topmostSubform>

The above XML is our test data with which we can verify the working of the automated functionality.

3. The final step is to implement the code that will merge XML data with the offline PDF that is already stored in AEM assets.

a) An OSGI service class is implemented for this that has a method having 2 parameters — path to the PDF file and the XML data that has the data from the 3rd party system

Asset asset = resourceResolver.getResource(formTemplate).adaptTo(Asset.class);
pdfInputStream = asset.getOriginal().getStream();
// Offline PDF present in content DAM
Document pdfDocument = new Document(pdfInputStream);
// xmlData is from 3rd party system or earlier generated dummy XML
String xmlData = xmlData;
Document xmlDocument = new Document(new ByteArrayInputStream(xmlData.getBytes(StandardCharsets.UTF_8)));
PDFOutputOptions pdfOptions = new PDFOutputOptions();
pdfOptions.setAcrobatVersion(AcrobatVersion.Acrobat_11);
generatedDocument = this.outputService.generatePDFOutput(pdfDocument, xmlDocument, pdfOptions);
outputInStream = generatedDocument.getInputStream();
out = response.getOutputStream();
int length;
byte[] buffer = new byte[4096];
while ((length = outputInStream.read(buffer)) > 0) {
out.write(buffer, 0, length);
}

Summary

Finally, using the above steps, a user will have an offline PDF form on their desktop that is pre-filled with their demographic data.

Referring all of the above, it should be clear on how to programmatically merge XML data into an existing offline PDF form.

It should be easy to implement the same in any AEM Forms implementation that meets the pre-requisites.

Adobe Tech Blog

News, updates, and thoughts related to Adobe, developers…