Enhancing PDF Generation: Integrating Text and Graphics using PyPDF2 and reportlab.

Sarumathy P
featurepreneur
Published in
6 min readMay 31, 2023

In today’s digital era, the need for automated document generation is paramount, particularly when it comes to creating PDF files with dynamic content3. Python, with its versatile libraries and robust functionality, offers an efficient solution for generating dynamic PDFs. One such powerful library is PyPDF2, which allows for the manipulation and merging of PDF files. In combination with the reportlab library, Python provides a comprehensive toolkit for creating customized PDF documents from templates.

In this article, we will explore how to leverage these libraries to text in PDF.

Installation:

pip install PyPDF2 reportlab

In Ubuntu 20.04, If reportlab installation throws an error like this,

  note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for pycairo
Failed to build pycairo
ERROR: Could not build wheels for pycairo, which is required to install pyproject.toml-based projects

The error message suggests that the build process for “pycairo” is failing due to a missing dependency, specifically the “gcc” compiler.

To resolve this issue, you can try the following steps:

  1. Install the necessary build tools and development libraries by running the following command in the terminal:
sudo apt-get install build-essential

This command installs essential build tools, including the “gcc” compiler, which is required to build “pycairo”.

Try installing the required dependencies again using the following commands and try installing reportlab again:

pip install wheel
pip install pillow

pip install reportlab

Once all the requirements are installed, The PDF file can be modified and dynamically generated using the class,

from PyPDF2 import PdfWriter, PdfReader, Transformation
import io
from reportlab.pdfgen.canvas import Canvas

class GenerateFromTemplate:
def __init__(self,template):
self.template_pdf = PdfReader(open(template, "rb"))
self.template_page= self.template_pdf.pages[0]

self.packet = io.BytesIO()
self.c = Canvas(self.packet,pagesize=(self.template_page.mediabox.width,self.template_page.mediabox.height))


def addText(self,text,point):
self.c.drawString(point[0],point[1],text)

def merge(self):
self.c.save()
self.packet.seek(0)
result_pdf = PdfReader(self.packet)
result = result_pdf.pages[0]

self.output = PdfWriter()

op = Transformation().rotate(0).translate(tx=0, ty=0)
result.add_transformation(op)
self.template_page.merge_page(result)
self.output.add_page(self.template_page)

def generate(self,dest):
outputStream = open(dest,"wb")
self.output.write(outputStream)
outputStream.close()

"""
Use as:
gen = GenerateFromTemplate("template.pdf")
gen.addText("Hello!",(100,200))
gen.addText("PDF!",(100,300))
gen.merge()
gen.generate("Output.pdf")
"""

The code can be broken down into 4 parts:

Part 1: Importing Libraries and Class Initialization

The first part of the code involves importing the necessary libraries and initializing the main class, “GenerateFromTemplate”.

def __init__(self,template):
self.template_pdf = PdfReader(open(template, "rb"))
self.template_page= self.template_pdf.pages[0]

self.packet = io.BytesIO()
self.c = Canvas(self.packet,pagesize=(self.template_page.mediabox.width,self.template_page.mediabox.height))
  • Inside the class, the “init” method is defined, which initializes the class attributes when an object is created.
  • The method takes the “template” parameter, which represents the path to the template PDF file.
  • The template PDF is opened using the PdfReader class and stored in the “template_pdf” attribute.
  • The first page of the template PDF is extracted and stored in the “template_page” attribute.
  • An in-memory packet is created using the io.BytesIO() function and stored in the “packet” attribute.
  • A Canvas object is created using the packet and the size of the template page, and stored in the “c” attribute.

A canvas represents a drawing surface that allows you to add various graphical elements, such as text, images, shapes, and lines, to a PDF document. It provides a way to position and draw these elements on a specific page or area within the PDF.

Part 2: Adding Text to the Template

The second part of the code involves adding text to the template PDF.

def addText(self,text,point):
self.c.drawString(point[0],point[1],text)
  • The method takes two parameters: “text” (the content to be added) and “point” (the coordinates where the text should be placed).
  • Inside the method, the “drawString” function from the Canvas class is used to draw the text on the canvas.
  • The “drawString” function takes the x and y coordinates (point[0] and point[1]) and the text as arguments and writes the text at the specified position on the canvas instance created.

Part 3: Merging the Content

The third part of the code involves merging the template PDF with the dynamically generated content.

def merge(self):
self.c.save()
self.packet.seek(0)

result_pdf = PdfReader(self.packet)
result = result_pdf.pages[0]

self.output = PdfWriter()

op = Transformation().rotate(0).translate(tx=0, ty=0)
result.add_transformation(op)

self.template_page.merge_page(result)
self.output.add_page(self.template_page)
  • Inside the method, the “save” function of the Canvas class is called to save the content drawn on the canvas.
  • The “seek” method of the packet object is used to reset the position of the in-memory packet to the beginning.
  • A PdfReader object is created using the packet, and the resulting PDF is stored in the “result_pdf” variable.
  • The first page of the result PDF is extracted and stored in the “result” variable.
  • A PdfWriter object is created and stored in the “output” attribute.
  • A Transformation object is created to define any transformations to be applied to the result page (in this case, no rotation or translation).
  • The result page is merged with the template page using the “merge_page” method of the template page object.
  • The merged page is added to the output using the “add_page” method.

Part 4: Generating the Final PDF

The fourth part of the code involves generating the final PDF by writing the merged content to a destination file.

def generate(self,dest):
outputStream = open(dest,"wb")
self.output.write(outputStream)
outputStream.close()
  • The method takes the “dest” parameter, which represents the path and filename of the output PDF file.
  • Inside the method, an output stream is opened in write binary mode using the “open” function.
  • The merged content stored in the “output” attribute is written to the output stream using the “write” method.
  • The output stream is closed using the “close” method.

For example, I have a PDF file like this, and I am gonna add some text to it.

I created an Instance of the class GenerateFromTemplate and wrote some text to it. I found the position (x,y coordinates) by the Trial and Error method.

gen = GenerateFromTemplate("template.pdf")

gen.addText("Hello!, I am Ensify",(270,900))

gen.merge()
gen.generate("Output.pdf")

Increasing Font Size:

You can modify the addText method as per your requirements. For Example,

    def addText(self,text,point, font_family= "Helvetica-Bold",font_size=25):
self.c.setFont(font_family, font_size)
self.c.drawString(point[0],point[1],text)

The addText method now accepts additional parameters font_size, font_familywhich determines the size of the font and the font family. By increasing the font_size, the text will appear thicker. You can adjust the font_size value as needed to achieve the desired thickness for the text.

These arguments are passed as keyword arguments, If not explicitly specified in the function call, they will take the value specified in the function definition, i.e., Helvetica-Bold — Font family and 25 — font size.

Now, calling the method



gen = GenerateFromTemplate("dog.pdf")

gen.addText(text="Hello!! I am Buji",point=(270,900), font_family= "Times-Roman",font_size=25)
gen.addText(text="A Happy Pet Dog",point=(270,850), font_size=22)

gen.merge()
gen.generate("Output.pdf")

Adding Graphics:

Now, let us add some pictures to our pdf file. This can be done using drawImage method of Canvas class. Define a method addGraphics inside the class GenerateFromTemplate,

    def addGraphics(self, point, img):
self.c.drawImage(image=img, x=point[0],y=point[1])

You can also decrease or increase the size of the image to be added,

from reportlab.lib.utils import ImageReader #add this import statement together with other imports

def addGraphics(self, point, img, scale =1):

img = ImageReader(img)
img_width, img_height = img.getSize()

self.c.drawImage(image=img, x=point[0],y=point[1], width= img_width*scale, height=img_height*scale)
  • The addGraphics method now takes an additional optional parameter called scale, which represents the scaling factor for the image.
  • Within the drawImage function, we multiply the original width and height of the image by the scale factor to determine the new width and height of the image in the PDF.
  • By adjusting the scale value, you can decrease or increase the size of the image proportionally. A scale value less than 1 will decrease the size, while a scale value greater than 1 will increase it.

Calling the method,

gen = GenerateFromTemplate("dog.pdf")

gen.addText(text="Hello!! I am Buji",point=(270,900), font_family= "Times-Roman",font_size=25)
gen.addText(text="A Happy Pet Dog",point=(270,850), font_size=22)

gen.addGraphics(point= (120,780),img="doglogo.jpeg", scale=0.5)
gen.merge()

gen.generate("Output.pdf")

In this example, the doglogo.jpegwill be added to the PDF at the coordinates (120, 780) with its size reduced to half of its original dimensions(scale = 0.5).

Thus, With the Python libraries PyPDF2 and reportlab, automating PDF generation becomes a straightforward process. By leveraging the provided GenerateFromTemplate class, developers can easily create dynamic and customized PDFs from templates. The flexibility and functionality offered by these libraries empower users to generate professional-looking documents programmatically, saving time and effort.

References: https://stackoverflow.com/a/75675378/21978566

Hope this helps.

--

--