How to generate PDF in AWS Lambda

One of the advantages Serverless architecture has is that each function execution has its own environment, hence it scales infinitely well on a function level. It makes Lambda a good solution for the long running and resources heavy computing tasks, such as generating PDF.

In our AWS development environment running on t2.micro, we had problems with server crash due to out of memory when it is running wkhtmltopdf to generate PDF, as we are moving towards serverless, it’s a good candidate to dress in the new fashion.

There are a few posts online regarding how to do it but none of them seems complete, perhaps due to the fast-evolving of serverless, they are out of date, I figured I should write this to help people like me who were lost.

We use Serverless framework to manage the stack, the event source is API Gateway, ie the function will be fired when a request hits the API endpoint, this is the serverless.yml for the function part.

And the function code

In the function code, there are a couple of places you need to pay attention to.

isBase64Encoded: true, this is to indicate if the applicable request payload is Base64-encode. Obviously, you need to make sure your response body is base64.

process.env[‘PATH’], this is the key to the success, we all know in order to run wkhtmltopdf, the running environment needs to have wkhtmltopdf executable / binary in the PATH, it becomes tricky with AWS Lambda setup, you need to 1. package the correct binary, 2. set the correct permission & upload to the correct location. 3. configure API Gateway

Package the correct binary

We use webpack to build the bundle for Lambda, wget https://github.com/wkhtmltopdf/wkhtmltopdf/releases/download/0.12.4/wkhtmltox-0.12.4_linux-generic-amd64.tar.xz is where you get it, by the time I wrote I wasn’t able to find the latest 0.12.5 build from the official site https://wkhtmltopdf.org/downloads.html, so I had to go with 0.12.4. So yep the amd64 one is the lambda need. Unzip it and copy the wkhtmltopdf to your project folder. You need to make sure the binary is at process.env[‘PATH’], I put in code root, that’s why you need to have process.env[‘PATH’] =
process.env[‘PATH’] + ‘:’ + process.env[‘LAMBDA_TASK_ROOT’]
, this makes sure Lambda sees the binary.

Set the correct permission & Upload to the correct location

You need to give it executable permission, this thing isn’t easy with webpack, I had a huge problem with webpack not packing the wkhtmltopdf with the needed permission. Initially, I used https://github.com/webpack-contrib/copy-webpack-plugin , but it seems not bring the permission across, see the bug here https://github.com/webpack-contrib/copy-webpack-plugin/issues/35, then by reading the post I found https://github.com/GeKorm/webpack-permissions-plugin, it also does not work for me!! I guess there is this guy had the same frustration a year ago, he built this https://github.com/boazdejong/webpack-plugin-copy, I guess it worked at the time this dude wrote it because it was from webpack 6 age, it does not work with webpack 7, so from copy-webpack-plugin to webpack-plugin-copy neither work! So I had to write a new NPM module, https://www.npmjs.com/package/inq-webpack-plugin-copy, and finally it works!!!! This is the webpack config

Configure API Gateway

This one also got me for quite a few hours, because the response is application/pdf, API Gateway must be configured to support the response type.

I had to add application/pdf manually in AWS API Gateway dashboard, there is a plugin https://www.npmjs.com/package/serverless-apigw-binary, but again it does not work for some reason. After it’s added you must save the changes, and you MUST deploy the API changes.

Without making this change, you most likely will find the PDF generated not readable.

Conclusion

It again proves that the AWS Lamba learning curve is steep, community support is still not great. But hey isn’t it why we need coders like you and me, let’s make serverless community stronger!

UPDATE 26/09/2018:

Originally I had */* as binary type, it worked but later on I had a problem with API Gateway converting PUT request body from json to base64 encoded string, you don’t want to API Gateway to convert to base64 no matter what the content type is, you only want it when it is application/pdf. One thing very important is API Gateway relies on the request header Accept: application/pdf to determine convert or not. So make sure you have the correct header set.