Create a PDF service with AWS Lambda

fabio bianchi
TUI MM Engineering Center
3 min readJun 25, 2020

This article will describe how we created a service, to generate PDF documents for a given HTML page, using AWS Lambda service.

Since the function itself to generate the PDF is quite trivial, literally ~20 line of codes, the focus will be mainly on the operational side of things and the scaffolding of the project to allow the lambda service to be built and deployed with a CI/CD pipeline.

What actually is AWS Lambda Service
You can find plenty of articles about AWS Lambda (if you are interested see the official AWS Lambda).

In few words the Lambda service allows you to run arbitrary code (functions), within Amazon services ecosystem, that you can invoke from a wide variety of Event Sources (see Supported Event Source). The main benefit is that AWS scales automatically in response of increased traffic, the price model reflects this peculiarity allowing you to pay pretty much for what you use. Further appealing point is that the first 1 Million requests per month are free.

High level architecture
From a high level this is how the system looks like.

Project overview
Our CloudFormation stack is composed by API Gateway and AWS Lambda, we chose Serverless framework to be able to manage the stack and to run tests using `invoke local` command.

The pdf engine used is wkhtmltopdf while the function code itself is written in Python (pretty much just a wrapper to invoke the CLI tool with custom arguments).

Here is how the project structure looks like

.
├── Pipfile
├── Pipfile.lock
├── README.md
├── bin
│ └── wkhtmltopdf
├── html_to_pdf.py
├── package-lock.json
├── package.json
└── serverless.yml

and the serverless.yml

service: html-to-pdf
plugins:
- serverless-apigw-binary
custom:
apigwBinary:
types:
— 'application/pdf'
provider:
name: aws
stage: ${env:STAGE, ‘dev’}
apiName: html-to-pdf-api
profile: ${env:PROFILE, ‘dev’}
region: eu-west-1
deploymentBucket:
name: ${env:DEPLOY_BUCKET, ‘dev’}
apiGateway:
restApiId: ${env:API_ID, ‘dev’}
restApiRootResourceId: ${env:API_RES_ID, ‘dev’}
package:
exclude:
— node_modules/**
functions:
htmlToPdf:
handler: html_to_pdf.pdf_handler
name: ${env:LAMBDA_NAME, ‘html-to-pdf’}
runtime: python3.7
timeout: 30
memorySize: 128
events:
— http:
path: html-to-pdf
method: post
cors: true

At this point the build can be easily automated:


npm install \
npm install serverless \
pip install -r <(pipenv lock -r) — target . \
./node_modules/serverless/bin/serverless package

This will produce in .serverless folder the following output


.
├── cloudformation-template-update-stack.json
├── html-to-pdf.zip
└── serverless-state.json

Although serverless allows to deploy automatically (by running `serverless deploy`) we preferred to split `build` and `deploy` in two steps phases.

So deployment is executed after build is successful by:
1. uploading to S3 the generated zip artifact
2. updating CloudFormation Stack with the latest one produced by the build

Quick walkthrough the code
When you run external executables you need to add them to the PATH to make them available to your function:

import os
os.environ[“PATH”] = os.environ[“PATH”] + “:” + os.environ.get(“LAMBDA_TASK_ROOT”) + “/bin”

Since the function is invoked by API Gateway the response must be a json which must include property `isBase64Encoded`

return {
‘statusCode’: 200,
‘headers’: {
‘Content-Type’: ‘application/pdf’,
‘Access-Control-Allow-Origin’ : ‘*’
},
‘body’: pdfBase64Encoded,
‘isBase64Encoded’: True
}

Previously we already configured API Gateway to handle binary content when request Header ‘Accept’ contains ‘application/pdf’

plugins:
- serverless-apigw-binary
custom:
apigwBinary:
types:
— ‘application/pdf’

Conclusions
This has been an interesting exercise that did work out pretty quickly and now this Lambda Service is exposed internally to other services to generate documents from HTML, generating around ten thousands PDF per month.
Of course there are many variations that can be applied to achieve similar or better results, one of the benefits of having such a small independent service is that, as long as the external REST interface is kept compatible, it could be easily rewritten completely.

Hope you enjoyed this article

Happy coding!

--

--