PDF as a Service and its architecture

Published in

Valtech Switzerland

4 min readJan 10, 2020

In this article we explain briefly what PDF as a Service (PDFaaS) is all about and deep dive into its microservice architecture.

Challenges generating PDFs

Generating business PDF documents is challenging in several ways and requires special know-how.

Already for text-only documents: 3rd party PDF libraries have their costs and a more or less limited PDF standard support. Encoding non-English content, and preventing code injection from user input are common requirements that present some difficulties.

Graphic design: Corporate Image and Design come with their own set of layout and styling requirements. To implement them you have to use some crafting skills if using the existing, rather low-level, PDF composition tools.

PDF forms: Parsing, creating or filling a couple of them with few boxes is doable. But companies often need more than that. Large and increasing amounts of forms, forms that initially only exist as HTML, and forms that have many dynamic inputs of several types for example.

Templating: And once you are done with one template, you will find out that reusing portions of its layout for other templates is not possible; so keeping multiple ones up to date will cost you nearly as much as creating them. Similarly goes for PDF forms that you once entered into your system and change from time to time. Finally, integration in existing workflows will be not possible or just by offering sub-optimal UX, unless the generating solution is headless, like behind a REST API.

Users and their goals

Many non-IT users in companies need to be able to do all that with PDF in order to communicate best with their stakeholders.

For example, marketing, human resources, sales, business consultants, and administrators. They need to handle PDFs for activities such as announcing their strengths, introducing their services and products, reaching potential candidates and clients, onboarding others, upselling and cross-selling, and meeting legal requirements.

Our solution: PDF as a Service (PDFaaS)

PDFaaS is our cloud solution to fulfill the needs of those users, while easing the aforementioned challenges.

On one hand, it offers authors an intuitive Template UI where they can design dynamic templates and automate most of the effort involved in filling PDF forms.

On the other hand, it offers a REST API to enable customer applications to retrieve such templates and forms combined with existing datasets.

This all is offered in the cloud, in a solution that implements that UI mainly with TypeScript and Vue.js, and that REST API based on Java, SpringBoot and MongoDB technology. The following section is devoted to explain this design further.

Logical Architecture

Its overall architecture can be understood from its deployment diagram.

Note authors and customers on the rightmost area.

Template UI

As said, authors use the Template UI. This UI is in fact a mixed backend/frontend application. The backend logic generates all HTML headers and footers on the screen, and the frontend logic generates more dynamic parts like lists and forms of several entities — mainly PDF Templates, files (for covers and attachments), and PDF forms.

Several Spring controllers receive the requests for different screens. These add attributes to Spring UI models, that will be read afterwards by Thymeleaf templates.

The Thymeleaf templates end up including the JavaScript files required for the UI to work. This UI was originally based on reusable Thymeleaf fragments and jQuery objects, and is now progressively being ported to Vue.js. Both old and new components access the same microservices over REST API, which has some public endpoints and some private ones.

REST endpoints

Several other Spring controllers receive the requests for the different REST endpoints. These can read application properties such as typographies and font sizes via Beans, and also can read and update the MongoDB database and generate PDFs via the PDFGenerator package.

The private endpoints handle entities exclusive to the authors, while the public endpoints handle entities needed also by customers, like fetching existing templates already combined with data, or getting a web page printed into a PDF.

PDF Generator Facade

A few classes that illustrate what PDF Generator is all about

From design perspective, the PDF Generator is basically a facade around Dynamic PDF. This additional level of abstraction is valuable because of the long distance between Dynamic PDF’s API and some of the features that authors need to create content productively. In other words, Dynamic PDF’s API is relatively low level, and PDF Generator is a proper way to firewall the complexity of the richer PDFaaS API while ensuring the reliability of the logic and lowering its development costs.

For example, Dynamic PDF offers a static “Table”, which is just fine for some occasions. Hence, PDF Generator exposes it with just a thin wrapper around. However, Dynamic PDF does not offer a table that can be combined with data to generate N rows accordingly, which is also a frequent requirement from authors. PDF Generator offers such feature (named “Dynamic Table”), and implements it by adding its own intelligence on top of the more primitive “Table”.