PDF Generation in Symfony

During my day to day work as a Symfony developer every now and then I am asked to build a PDF. Not my main specialism (as you all know my real specialisms are cache invalidation and naming things) but having great tools like wkhtmltopdf and libraries like Snappy at your disposal you should only need to focus on the PDF contents, or…

I found that generating PDFs still involves a lot of boiler plate code.

  • More complex PDFs can consist of a cover, a table of contents, a header, a footer and of course the main content, often divided in sections
  • Each of these parts can originate from many different sources: some need to be rendered by a controller (as we mainly handle HTML), rendered by a templating engine or loaded from a file or URL
  • Contents often need to be cached for performance reasons
  • External resources like fonts, stylesheets and scripts can cause issues when generating many pages
  • I like coding but I don't like doing things over and over again
  • I have three kids and I want to see them grow up

These things are not covered by most PDF generation libraries so I decided to add my own abstraction layers on top of wkhtmltopdf and Snappy to make handling the use cases above easier.

My approach involves three parts: an asset factory, an embed extension and finally a PDF builder. This post will explain what these components do and how you can utilize them to make your work less repetitive and allow you to focus on the real PDF contents.

Loading PDF contents

One of the first questions you ask when you build a PDF is: where does your content come from? And with content I do not only mean the main contents but also the cover, table of contents, header and footer. For instance: you might want the cover to be a PDF file that is merged, the header and footer to be static html files and the contents to be rendered by a controller. Any of these cases involve extra coding before you can pass the result to snappy, though the end result is clear and the logic for loading file, URL or controller contents is generally pretty straightforward. We don’t want to write code like this over and over again.

The endroid/asset library helps by creating an abstraction layer for retrieving these contents. We just define an asset for any type of resource and pass it to the service needing the output. For instance the FileAsset loads a file, the UrlAsset loads data from a URL and the ControllerAsset renders a controller. Then the receiving service can simply call $asset->getData() and the data is resolved behind the curtains. You can define a controller asset like this.

$asset = new ControllerAsset(
$kernel,
$requestStack,
ContentController::class,
['locale' => 'en'],
);

Asset factory

As you can see, creating an asset can still be a tedious process. For instance, each controller asset needs the kernel and request stack to generate the response and each template asset needs a renderer.

To make asset creation easier an asset factory is introduced that delegates asset creation to so called factory adapters, where each adapter makes sure the necessary services are injected. For instance the ControllerAssetFactory has a reference to the kernel and passes it to all controller assets it generates.

Now you can create the same controller asset using the factory like this.

$controllerAsset = $assetFactory->create([
'controller' => ContentController::class,
'parameters' => ['locale' => 'en'],
]);

Guesser

Given the configuration above, the asset factory has to find a way to know what type of asset you are building. This is achieved by a so called guesser. When no explicit type is specified, the guesser looks through all registered factory adapters (while adhering to a guessing priority) and the first adapter matching the configuration is used.

Caching

When you need to cache some data you’d normally write some logic to retrieve the cache item, check for a hit or save the data.

$cacheItem = $this->cache->getItem($key);
if (!$cacheItem->isHit()) {
$cacheItem->set($data);
$cacheItem->tag($this->tags);
$cacheItem->expiresAfter($this->expiresAfter);
$this->cache->save($cacheItem);
}
return $cacheItem->get();

The endroid/asset library comes with a CacheAsset that can wrap any other asset and cache it according to modern cache standards. You can provide a key, one or more tags (so you can purge) and a lifetime. Most of the time you don’t need more than this for caching.

$cacheAsset = $assetFactory->create([
'controller' => ContentController::class,
'parameters' => ['locale' => 'en'],
'cache_key' => 'content',
]);

As you can see the logic for obtaining the controller contents and caching are obfuscated and internal dependencies are automatically resolved, leaving you with a simple and clean configuration.

Here the asset factory is used in the context of generating a PDF but you can use this as a provider for any service that requires content to be loaded from different sources.

Embedding external resources

As mentioned earlier, use of external resources can introduce issues. For instance when you generate a large PDF containing headers and footers, these are generated for every page in your PDF. So if your PDF has 400 pages, the header and footer are processed 400 times each so they are rendered with the correct page number and section name.

When your header or footer requires external resources like stylesheets or fonts, this can result in a lot of requests. Also, PDF generators are often executed from the command line where loading URLs might be problematic.

In other words: you want to avoid external request as much as possible during PDF generation. The easiest way to do this is by making sure the resources are already embedded in your contents instead of loading them. Fortunately, for web assets we have the ability to create data URIs for practically any location where you would normally specify a URL. For instance you can put the data URI in an img src attribute, a href attribute or use it instead of an URL in embedded stylesheets.

However, generating a data URI from a resource requires some coding: we need to load the contents and then pass it as a variable to the template. Here the endroid/embed library comes into play: it provides a Twig extension that does this automatically for you. You only need to use the embed method around your external resource URL and it automatically replaces it with the encoded contents.

Building the PDF

The asset factory and embed extension make work a lot easier but when you generate the PDF you still need to create the assets via the asset factory and pass them to the right options in Snappy. To improve this workflow a PDF builder is created around Snappy and the asset factory. This builder accepts option arrays, passes these to the asset factory and sets the correct options in Snappy. This way you can do everything via one single service. In the end building a PDF looks something like this.

<?php

namespace App\Controller\Pdf;

use Endroid\Pdf\Builder\PdfBuilder;
use Endroid\Pdf\Response\InlinePdfResponse;
use Symfony\Component\HttpFoundation\Response;
use Symfony\Component\Routing\Annotation\Route;

class GenerateController
{
private $builder;

public function __construct(PdfBuilder $builder)
{
$this->builder = $builder;
}

/**
*
@Route("/pdf")
*/
public function __invoke(): Response
{
$this->builder
->setCover([
'controller' => CoverController::class,
'cache_key' => 'cover',
])
->setTableOfContents([
'file' => 'table_of_contents.xml',
'cache_key' => 'toc',
])
->setHeader([
'template' => 'pdf/header.html.twig',
'cache_key' => 'header',
])
->setFooter([
'template' => 'pdf/footer.html.twig',
'cache_key' => 'footer',
])
->setContent([
'controller' => ContentController::class,
'cache_key' => 'content',
])
;

return new InlinePdfResponse($this->builder->getPdf());
}
}

Please note you can override any of the existing Snappy options via $builder->setOptions([…]) so you still have full control. Also the library provides an InlinePdfResponse for rendering your PDF in a browser and automatically handles scenarios where you want a margin on you content pages but you don't want that margin on the cover page (a limitation of wkhtmltopdf).

Thank you for reading :) As I mentioned, this approach helps me build PDFs much quicker and I hope I helped someone by sharing this. If you have any comments or questions let me know.