Generating PDFs with AWS Lambda NodeJS 10+ Runtimes — The Hard and Easy Way

Joonas Laitio
Dec 18, 2019 · 5 min read

PDFs are a curious remnant of a bygone technological era — in many ways cumbersome to work with and produce, but still widely used partly exactly because of that. Their perceived immutability and interoperability still makes them necessary in many applications.

Producing them with modern tech stacks isn’t always without problems, particularly if you don’t have full control of the runtime environment, along with the low level shared libraries and fonts required, as is the case in an AWS Lambda function and their predefined runtime environments.

Sorry but I’m in a hurry actually and I got PDFs to generate, could you just give me the goods and I’ll be on my way, thanks!

Very well — here’s the end result, scroll down a bit for the journey.

Provided you use the Serverless Framework and Webpack:

  1. Add a compatible Chrome (yes, the browser) dependency as a layer to your Lambda function (get a pre-deployed ARN specific to your region here):
#serverless.yml
functions:
myPdfGenerationFunction:
layers:
- arn:aws:lambda:eu-north-1:764866452798:layer:chrome-aws-lambda:8

2. Let Webpack know that that dependency is being handled outside your function so it won’t be packed there as well:

#webpack.config.js
module.exports = {
[...]
externals: ['aws-sdk', 'chrome-aws-lambda'],
[...]
};

3. Generate your PDFs from HTML (with your desired options) in your function code (EDIT: thanks Jyrki Heikkinen for a bug fix):

#myFunctionHandler.js
const chromium = require('chrome-aws-lambda');
// Function that takes HTML and outputs a base64-encoded PDF binary
const toPdf = async (html) => {
let browser = null;
try {
browser = await chromium.puppeteer.launch({
args: chromium.args,
defaultViewport: chromium.defaultViewport,
executablePath: await chromium.executablePath,
headless: chromium.headless
});
const page = await browser.newPage();
await page.setContent(html);
const pdf = await page.pdf(); return pdf.toString('base64');
} finally {
if (browser !== null) {
await browser.close();
}
}
};

4. If you end up serving the PDF as binary from a REST API, you need to return the body as base64 encoded from your Lambda function with the appropriate content type, and tell API Gateway to convert it to binary for the client:

#serverless.yml
plugins:
- serverless-apigw-binary
custom:
apigwBinary:
types:
- 'application/pdf'

You’re done.

Hmmm, maybe I’m also interested what I just did and why!

Okay, let’s start from the beginning.

The Hard Way

The first instinct when starting to build something, is to put it together from its own basic building blocks. For PDFs in NodeJS this would mean something like PDFKit, formulating the basic elements of text and graphic and positioning them manually to specific coordinates. However the build mechanisms utilize low level libraries of the OS, so you might end up having trouble in a Lambda runtime, and handling the presentation pixel-by-pixel gets cumbersome very quickly.

What if we could use a presentation format that we are already familiar with, like HTML and CSS, and turn that into a PDF? One of the simplest ways to do that is to use a command line tool such as wkhtmltopdf. Package a compatible binary with your function and use a wrapper library to use it from NodeJS. This approach was handy for a while, but because it’s a binary command line tool that still depends on shared OS level libraries, using it has become harder lately due to AWS Lambda runtime upgrades.

Because NodeJS is known for a short turnaround and fast deprecation in its versioning, upgrading Lambda runtimes is a common task for any NodeJS Serverless developer. However, with the NodeJS 10 runtime, AWS took the opportunity of also upgrading its runtime image base to Amazon Linux 2, along with a principal change of not using a set-in-stone minor version, changing the runtime dynamically as we go instead. That means that not only are many assumptions about the lower level workings of the runtime broken after upgrading, we also cannot expect them to be as static as before.

While using wkhtmltopdf is possible in Amazon Linux 2, you need to jump through some serious hoops to make it work, and it’s still not guaranteed that a Lambda runtime adjustment won’t break it. So what’s next?

The Easy Way

If only we had a quality completely self-contained package that is able to handle basic presentation, including fonts, along with the ability to turn that presentation into PDFs! Well, we do. It’s called a web browser. While at first glance packaging a whole browser with your code to do a seemingly simple task, it makes a lot of sense in this context (and many others). Headless mode and Puppeteer make it no problem to run and use without actually requiring things like GUI capabilities.

There’s a very good rundown here about such a PDF generator implementation which you should take the time to read, using Puppeteer to drive a headless Chrome to turn HTML into a PDF binary, including a repository of an example project. The only hitch is that it uses NodeJS 8, which is at the end of its road, and the simple ways to include a Chrome binary as a dependency don’t play quite as nice with the newer (NodeJS 10+) runtimes, due to the troubles detailed above.

A workaround is to include the Chrome dependency as a AWS Lambda layer. Layers are a way to include common dependencies across Lambda functions, and they can be easily shared, even publicly across accounts. A “huge self contained dependency that is sometimes needed but definitely not all the time” is a prime candidate for layering anyway, and for non-self-evident reasons the Chrome dependency seems to work better when imported from a layer. There are simple instructions how to build your own Chrome dependency layer, in case you feel queasy copy-pasting layer references from the Internet for your production code.

So there you have it. This is one of those problems that went from “that shouldn’t take long” to an interesting journey of discovery and learning. While such problems are annoying if you are in a rush to meet a deadline, they are the things that make software development fascinating.

Joonas Laitio

Written by

Engineer, referee, bassist. Building foundations for others to go crazy on.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade