Create PDF using Pdf-lib on Serverless AWS Lambda

Crespo Wang
The Startup
Published in
5 min readFeb 25, 2020

I’ve written quite a few How-To-Create-PDF-using-X-on-Serverless-AWS-Lambda posts, covered wkhtmltopdf, Chromium Puppeteer and Pdf-Kit, and I’ve been getting some requests asking to write articles on other libraries. Pdf-lib is one of them.

To be honest, I wasn’t aware of PDF-LIB before, it’s a fairly new solution comparing with the others, but the trending is quite promising, code is easy to understand, API documentation is comprehensive, and more importantly, the maintainer is obviously very active in terms of bug fixing and new features, so I decided to give it a go!

HTTP Endpoint

As always let’s create an HTTP endpoint for the function. In your Serverless.yml you must set binaryMediaTypes to application/pdf in order for API Gateway to pass through the response body.

If the Content-Type header of the response and the Accept header of the original request match an entry of thebinaryMediaTypes list, API Gateway passes through the body. This occurs when the Content-Type header and the Acceptheader are the same; otherwise, API Gateway converts the response body to the type specified in the Accept header

> Code

The Pdf-lib for creating the PDF is pretty simple, the tricky part is returning the PDF. According to the Pdf-lib API document, when the PDF is saved, it returns an Uint8Array

// Serialize the PDFDocument to bytes (a Uint8Array)
const pdfBytes = await pdfDoc.save();

It needs to be converted to base64 string, looking at NodeJS doc we can use Buffer.from() to do the job.

const buffer = Buffer.from(pdfBytes);return {
statusCode: 200,
headers: {
"Content-type": "application/pdf"
},
body: buffer.toString("base64"),
isBase64Encoded: true
};

> Layer

If you are a perfectionist and care about every Byte, you can use Layer to reduce the function size by 95%!

Without Layer, Pdf-lib related NPM modules take most of the package size, you can see what are in the deployment package.

yarn sls package
ls -lah .serverless
the deployment package is 11M
unzip serverless-lambda-pdflib.zip -d unzipped
du -sh unzipped/node_modules/* | sort -h

Using Layer can significantly reduce your function size, consequently will bring a faster deployment and shorter cold start*.

*Update: Thanks to Dave Irvine point out, layer doesn’t help reduce cold start as Lambda will still need to combine layer and the function code to deploy to a container for running.

There is a certain directory structure you must follow when making a layer file, for nodejs runtime, the unzipped layer must live under a folder named nodejs.

So first we create a directory and name it nodejs, then we can install pdf-lib by creating a new package.json

mkdir -p nodejs
cd nodejs
npm init
npm install pdf-lib

Then you can zip the nodejs directory for uploading.

zip -9r pdflib-layer.zip nodejs

Now that we have the layer zip ready, we need to tell Serverless to use it.

In the function definition to attach the layer to the function.

Remember that the function layer reference must follow the pattern

To use a layer with a function in the same service, use a CloudFormation Ref. The name of your layer in the CloudFormation template will be your layer name TitleCased (without spaces) and have LambdaLayer appended to the end.

At last, remember to exclude the library because it’s now in the layer!

The deployment package is reduced to 160K! Yay!!

S3 Event

One of the advantages of using Pdf-lib is that you can modify an existing PDF, a quite common use case is that a PDF is uploaded to S3, and we want to process the PDF, this can be done by hook up Lambda function with an S3 event.

You can hook up the Lambda function to an existing S3 bucket, or you can let Serverless framework to manage the bucket, see details here. In this demo, I am letting Serverless does all the job.

It is important to let set the correct IAM role for the function, you will need s3:GetObject for the function to read the file, and s3:PutObject to write the processed file to the bucket.

In the function definition, we want to fire the function when an s3:ObjectCreated:* event occurs, and it only cares about pdf files created under the uploads directory.

> Code

When an event is triggered, you can extract the bucket and object key from the event record.

const s3event = event.Records[0] && event.Records[0].s3;
const bucketName = s3event && s3event.bucket && s3event.bucket.name;
const objectKey = s3event && s3event.object && s3event.object.key;

Then we use aws-sdk to get the file stream, wraps3.getObject into a promise so that you can use async/await. Note that when it resolves it must resolve data.Body because it’s the actual file buffer.

The demo code will add a new page to the first page of the existing pdf, then it saves the modified pdf to a new directory.

When you are done with the manipulation, use the saveAsBase64() function to return the base64 file data, then convert it back to Buffer because it’s what s3.putObject expects.

Conclusion

Processing PDF is a typical use case for serverless/Lambda, because it can be a memory heavy task, and normally is not time-critical, so only firing up the computing resource when needed can save you a lot of dollars, and a lot of ops time.

Living in NodeJS world meaning that you have access to a huge number of open-source libraries. When it comes to choosing the one, I always stick with the following rules.

  • Choose the one that is actively maintained.
  • Avoid the one trending is going down.
  • Documentation is good.

It is why I think pdf-lib is a good candidate for PDF manipulation.

Happy Coding!

Code lives at https://github.com/crespowang/serverless-lambda-pdflib

--

--