Building a PDF Generator on AWS Lambda with Python3 and wkhtmltopdf

Introduction

{
"filename": "sample.pdf",
"html": "<html><head></head><body><h1>It works! This is the default PDF.</h1></body></html>"
}

Setup

  • Python3
  • Aws CLI installed and configured
  • Serverless installed

Serverless

sls create --template aws-python3

WKHTMLTOPDF Binary

./binary/wkhtmltopdf

Python3 Dependencies

sls plugin install -n serverless-python-requirements
custom: pythonRequirements: dockerizePip: true
pdfkit

Ready, Set, Code

Serverless.yml

BucketName: 'your-s3-bucket-name'
service: pdf-services # name or reference to our project provider: # It is possible to use Azure, GCloud, or AWSfunctions: # Array of functions to deploy as Lambdasresources: # S3 buckets, DynamoDB tables, and other possible resources to createplugins: # Plugins for Serverlesscustom: # Custom variables used by you or plugins during setup and deployment
  1. Create an S3 bucket called pdf-service-bucket to store our PDFs
  2. Create a function that will create the PDFs
  3. Give our function access to the S3 bucket
  4. Setup an API endpoint for our Lambda function at:
POST https://xxxx.execute-api.xxxx.amazonaws.com/dev/new-pdf
service: pdf-service
provider:
name: aws
runtime: python3.7
# Set environment variable for the S3 bucket
environment:
S3_BUCKET_NAME: ${file(./config.yml):BucketName}
# Gives our functions full read and write access to the S3 Bucket
iamRoleStatements:
- Effect: "Allow"
Action:
- "s3:*"
Resource:
- arn:aws:s3:::${file(./config.yml):BucketName}
- arn:aws:s3:::${file(./config.yml):BucketName}/*
functions:
generate_pdf:
handler: handler.generate_pdf
events:
- http:
path: new-pdf
method: post
cors: true
resources:
# Creates an S3 bucket in our AWS account
Resources:
NewResource:
Type: AWS::S3::Bucket
Properties:
BucketName: ${file(./config.yml):BucketName}
custom:
pythonRequirements:
dockerizePip: true
plugins:
- serverless-python-requirements

Handler.py

  • Context contains environment variables and system information.
  • Event contains request data that is sent to the lambda function.
import json
import pdfkit
import boto3
import os
client = boto3.client('s3')
# Get the bucket name environment variables to use in our code
S3_BUCKET_NAME = os.environ.get('S3_BUCKET_NAME')
def generate_pdf(event, context):

# Defaults
key = 'deafult-filename.pdf'
html = "<html><head></head><body><h1>It works! This is the default PDF.</h1></body></html>"

# Decode json and set values for our pdf
if 'body' in event:
data = json.loads(event['body'])
key = data['filename']
html = data['html']
# Set file path to save pdf on lambda first (temporary storage)
filepath = '/tmp/{key}'.format(key=key)

# Create PDF
config = pdfkit.configuration(wkhtmltopdf="binary/wkhtmltopdf")
pdfkit.from_string(html, filepath, configuration=config, options={})
# Upload to S3 Bucket
r = client.put_object(
ACL='public-read',
Body=open(filepath, 'rb'),
ContentType='application/pdf',
Bucket=S3_BUCKET_NAME,
Key=key
)

# Format the PDF URI
object_url = "https://{0}.s3.amazonaws.com/{1}".format(S3_BUCKET_NAME, key)
# Response with result
response = {
"headers": {
"Access-Control-Allow-Origin": "*",
"Access-Control-Allow-Credentials": True,
},
"statusCode": 200,
"body": object_url
}
return response

Deploy

sls deploy
https://xxxxxx.execute-api.us-east-1.amazonaws.com/dev/new-pdf
curl -d '{"filename":"my-sample-filename.pdf", "html":"<html><head></head><body><h1>Custom HTML -> Posted From CURL as {JSON}</h1></body></html>"}' -H "Content-Type: application/json" -X POST REPLACE-WITH-YOUR-ENDPOINT

Conclusion

Next Steps

Resources

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store