PDF Automation. Create a Better Document than Just a Browser Ctrl + P

Jessica Morales
Yellowme
Published in
9 min readJun 4, 2020

Provide more information about your product to your customers.

The inspiration to write this post came to me at the end of the PDF automation project to which I was assigned to. At that moment, I realized that the main issue that kept stopping our progress was not the coding or creating a CSS in a different way than a web page, it was finding the information through the internet. And for that, I decided to help others by leaving a summary of the information I found and the tips I got.

By the way, I like to cook so, maybe by the end of this post, it may look like a “tech-recipe

The Developer Tools that we are going to use:

  • To make it look nice we use: HTML 5, CSS 3, Jinja 2.
  • For back-end: Python 3.7 and above, Jinja 2 API, wkhtmltopdf.
  • The clouds: Box, AWS-S3 Bucket.

Let’s get on with it.

We will start by installing the dependencies to set up the project. For this you previously must have installed Python 3 in your system. If you do not have it you can go here and follow the installation instructions.

Cool, now to have a proper start you should create a space for your project. To do that, you create a folder named as you want (one example could be html_to_pdf_101). Inside, you are going to create a virtual environment to isolate our package dependencies locally.

Generating a Virtual Environment

In your terminal go inside your project folder and run the following command to create the virtual environment.

python3 -m venv env

When the process has finished you should have a new folder named env. Like this:

├── html_to_pdf_101
└── env

Now run the following command to activate the virtual environment.

source env/bin/activate

Creating Requirements file

Alright, let’s create our installation requirements file. With this file you will have good management of your dependencies. Copy the following list, paste in a txt file named requirements and save it in our project file.

├── html_to_pdf_101
└── env
└── requirements.txt

Now we are going to use this file to install the dependencies. Run the following command in your terminal:

pip install -r requirements.txt

Installing WKHTMLTOPDF

The tool will help you render the HTML into PDF. I will show you two methods, but if you need more information about the tool and other installation methods you can read the official documentation here.

  • Fedora/Centos:
    sudo dnf install wkhtmltopdf
  • Debain/Ubuntu:
    sudo apt-get install wkhtmltopdf

Excellent! After you have everything installed we can start…

Doing Code!

Managing SNS and API

As the main source of information we use the Amazon Notification Service (SNS). And because we knew the format, we were able to break it down as in line 4 at the example below.

Also, in line 4 at the example above, we parse the JSON string with the method json.loads() to work with it.

You can use an API instead of the SNS or with it. I leave you an example below as well.

BoxDev

What is Box?

It is a modern content management platform that transforms how organizations work and collaborate to achieve results faster. Box Platform provides content APIs to build secure content experiences in custom apps.

What is BoxDev?

It is a platform that allows you to seamlessly bring the full power of Box into your application.

Implementing BoxDev Server-Side.

  • Administrative Steps.

First, of course, you have to allow access to a Box account. If you do not have one you can go to this link and signup for free.

At last, you are logged-in. You will be able to access the Dev Console.

box menu

You already are in Box developer, here you can access your app or create a costume app setup with JWT. You can find the creation instructions here.

In your app at left menu, go to configuration. We will go through the section to select the correct options:

Authentication method section. The OAuth 2.0 with JWT should be selected.

Application access. Select Enterprise option

Advance features. Both options should be selected.

Add and manage public keys. The simple way to work with this section is to use generate a public/private key pair option. This will automatically create a JSON file and a public key ID. The file will be used to authenticate us so you should save it in your project folder.

Once you have done the above, you must go to back to the general panel and request authorization for the app.

Great!! What is next? … some admin steps.
If you are not the account owner you should ask them to complete the procedures for you.

At the principal Box menu click the Admin Console option .

At the admin console, select Apps option.

Inside the Apps menu, click on the Custom Apps tab. There you should find the submitted app.

If you could not find it click authorize new app button.

You could find the Client ID here:

Finally, to validate the authorization go to the ellipses at the end of the app and click on the reauthorize app option.

You have to repeat this last step every time you change something in the app.

  • The Code

Allow Access to BoxDev

The objective of the following step is to get an access token. We will do this using the configuration file that you downloaded. Also, you have to create a copy of the file (you could name it as you like). I named it user_config, in the copy you have to delete the entire line where the enterpriseID is.
I will give you two examples: one using the files locally and second using them from AWS-S3 bucket.

From local:

From AWS-S3 bucket:

It is worth mentioning that you first need an AWS account and a previous AWS CLI installation.

For interacting with AWS-S3 bucket we used boto3. I give you a method example named read_json_config_file. If you want to know more about how to work with AWS tools and python you could check here.

The next step is to write a method to access any folder and download the files.

The method access_folders help us access any folder using the id of the father folder (you can find this in the URL of any folder Box) and the name of the folder where the file are. For e.g. folder_id_parent could be “110548949863” and the folder_name_child “my_summer_picture”

To render the images in the HTML we use the next URL:

The item_id is the id of the image you want to render and the access_token, as you could notice, is from the authorization procedure.

Finally, we will start working at same time with HTML, Jinja 2, Jinja 2 API, and the wkhtmltopdf through the pdfkit dependency.

As in BoxDev I am giving to you examples using the files locally or from AWS-S3 bucket.

Let’s start by creating the environment that Jinja 2 API uses to load the template from the local system or other location.

Locally:

The FileSystemLoader could be any path inside your folder project or your local systems.

From AWS-S3 bucket:

We will create a class to subclass BaseLoader class and override get_source to implement our loading mechanism.

Base on Vince Veselosky code.

Now you just have to assign the Bucketloader to the loader in the Environment.
The bucket _name is your bucket’s name in AWS and the folder_name is the name of the folder where your template is.

Environment(loader=Bucketloader("bucket_name","folder_name")).

The second step is to pass all our information to the HTML. We are going to do this through a JSON.

We will use the variables from the JSON in the HTML using Jinja 2, as we would in Flask or Django templates.The variables get with the values when a template is render.

Another thing you can do with Jinja is to import other templates an pass variables to them as shown in the second image above. Also other powerful part of Jinja is that you can use template inheritance too.

Third, we are going to use some wkhtmltopdf’s global options via pdfkit. You can find more global options here. To use them, remove the double dash.

A little tip. If you want to remove the default margin dimensions that the wkhtmltopdf manage, you have to let the margins at zero as the options and in the CSS the body’s margin and the padding state in zero too.

Another useful CSS tip I can give to you is to use page breaks. With this, you can control when and where the page breaks and past to a new page.

We have almost finished. We are going to render the template, and turn it into a PDF.

The example saves the PDF in the main root of the project. Also, you could use False instead of output_path to save the PDF into a variable. One use of saving the PDF information into a variable is to upload to cloud storage.

Now we are going to use this variable to upload or update the PDF at the Box. Line 9 is to upload, and line 11 is to update the PDF. Remember that, to update the PDF, you first have to get the PDF’s id that already exists.

Congrats we finished. Give me 5.

Here is an example of how the PDF could look.

In Conclusion

In conclusion, this is one way to create a PDF with a good variety of tools. Also, it is a good start in the creation of PDFs. Additionally, we knew a little bit about AWS S3, BOXDev, wkthmltopdf, and other cool tips who helps us to develop the PDF. Furthermore, it is not necessary you know Python you can use Node instead (except for boto3 that dependency it just for python you have to use aws-sdk instead). If you want to know more and do your PDF cooler than now. I let you some links that will help you with it.

--

--