Data Transformation Pipelines in AWS (Part 1)

Creating PDFs from HTML with AWS Lambda and API Gateway

There are quite a few cases in which we’d like to be able to output a dynamic PDF (invoices, statements, receipts, etc.) However, our experience has been that working with PDF templates and editors is fairly painful. We would instead like to be able to work with the tools we’re familiar with, HTML and CSS. As such, we needed a mechanism which takes HTML as input and returns a PDF as output.

Creating the HTML to PDF Lambda

Creating a custom Lambda is fairly straightforward. You’ll need to log into your AWS account, head over to Services > Compute > Lambda, and click “Create a Lambda function”. You’ll be asked to select a blueprint, and you’ll choose blank. Next it will ask you to configure a trigger, but let’s skip that for now. The next section allows you to configure your function, and has a ton of options. We really only care about a handful of these.

Name, description, and runtime are obvious. The index.handler tells the lambda to look for an index.js in the root of the project. The only value that we need to change here is the IAM role. We’ve created lambda_s3_exec_role which will let our lambda read from / write to S3, which we’ll need to do our transformation.

Unfortunately since we have external dependencies we aren’t able to use the inline code entry type, and we’ll have to package up our code and upload it.

Create the Project

mkdir html-to-pdf
cd html-to-pdf
touch index.js
vim index.js

Let’s take a run at a first implementation. To start we’re going to read in the html in base64 from the event, that way we can test it from the configuration screen directly (extended from the example code here).

We need to add our dependencies to the project; we’ll do that with yarn add wkhtmltopdf memorystream aws-sdk.

Then we can zip the project and we’ll be ready to upload zip --exclude \*.git\* -r release.zip .

Test the Project

We’ll start by configuring a test event.

The following base64 string PGJvZHk+SGVsbG8gd29ybGQ8L2JvZHk+ corresponds to <body>Hello world</body>. So we’ll add that to our test event.

Go ahead and click Save and Test and you should see “Execution result: succeeded” with the following output:

{ "pdf_base64": "JVBERi0xLjQKMSAwIG9iago8PAovVGl0bGUgKP7..." }

If you check your S3 bucket, you should also see a test.pdf that looks like this:

Add a Trigger

Adding a trigger is really straight forward. Let’s go to the triggers tab and click “Add trigger”. We want it to trigger when an object is created, and we only want it to fire for html files.

Finished HTML to PDF converter

Let’s fast forward a bit. I’ve added the ability to read the html file from a bucket and write the corresponding pdf back to that bucket. The file and bucket name can be read from the trigger event. We can now upload an html file to this bucket and a corresponding pdf will be created.

Check out Part 2.


Originally published at www.drivenbycode.com on February 17, 2017.