How to Crop PDF Files with Python

Alice Yang
4 min readMay 17, 2024

--

Cropping allows you to extract specific portions of a PDF and use them for other purposes. For example, you could crop an image, chart, or excerpt from a larger document to incorporate into a new presentation or report. This article will explore how to crop PDF pages using Python. It includes the following topics:

Python Library to Crop PDF Files

To crop PDF files in Python, we will use Spire.PDF for Python. It is a feature-rich and user-friendly library designed to create, read, edit, and convert PDF files within Python applications.

You can install Spire.PDF for Python from PyPI using the following pip command:

pip install Spire.Pdf

If you already have Spire.PDF for Python installed and would like to upgrade to the latest version, use the following pip command:

pip install --upgrade Spire.Pdf

For more detailed information about the installation, you can check this official documentation: How to Install Spire.PDF for Python in VS Code.

Crop a Specific Page in PDF with Python

A specific page within a PDF file can be accessed using the PdfDocument.Pages[index] property. Then it can be cropped to a specific area using the PdfPageBase.CropBox property.

The following code explains how to crop a specific page in a PDF file and save it as a separate PDF file in Python:

from spire.pdf.common import *
from spire.pdf import *

# Specify the input and output file paths
input_pdf = "Test.pdf"
output_pdf = "CropPage.pdf"

# Initialize an instance of the PdfDocument class
pdf = PdfDocument()
# Load a PDF file
pdf.LoadFromFile(input_pdf)

# Get the first page by its index
page = pdf.Pages[0]

# Crop the page to the specified area
page.CropBox = RectangleF(PointF(30.0, 280.0),SizeF(552.0, 220.0))

# Initialize another instance of the PdfDocument class to create a new PDF file
new_pdf = PdfDocument()
# Insert the cropped page into the new PDF file
new_pdf.InsertPage(pdf, 0, 0)

# Save the new PDF file
new_pdf.SaveToFile(output_pdf)
new_pdf.Close()
pdf.Close()
Crop PDF in Python
Crop PDF in Python

Crop All Pages in PDF with Python

To crop all pages in a PDF, iterate through the pages in the document, and then crop each page to a specific area using PdfPageBase.CropBox property.

The following code explains how to crop all pages in a PDF file and save the result as a new PDF file in Python:

from spire.pdf.common import *
from spire.pdf import *

# Specify the input and output file paths
input_pdf = "Test.pdf"
output_pdf = "CropAllPages.pdf"

# Initialize an instance of the PdfDocument class
pdf = PdfDocument()
# Load a PDF file
pdf.LoadFromFile(input_pdf)

# Iterate through all pages in the file
for i in range(pdf.Pages.Count):
# Get the current page by its index
page = pdf.Pages[i]
# Crop the page to the specified area
page.CropBox = RectangleF(PointF(30.0, 280.0),SizeF(552.0, 220.0))

# Save the result to a new PDF file
pdf.SaveToFile(output_pdf)
pdf.Close()

Crop a PDF Page to Image, HTML with Python

In some cases, you may need to save the cropped PDF page to other formats of files, such as an image, an HTML file, and more for further usage. To save the page to an image, use the PdfDocument.SaveAsImage(pageIndex) method. To save the page to an HTML file or other formats of files, use the PdfDocument.SaveToFile(fileName, fileFormat) method.

The following code explains how to crop a specific page in a PDF file and save it as an image file in Python:

from spire.pdf.common import *
from spire.pdf import *

# Specify the input and output file paths
input_pdf = "Test.pdf"
output_image = "CropPage.png"

# Initialize an instance of the PdfDocument class
pdf = PdfDocument()
# Load a PDF file
pdf.LoadFromFile(input_pdf)

# Get the first page by its index
page = pdf.Pages[0]

# Crop the page to the specified area
page.CropBox = RectangleF(PointF(30.0, 280.0), SizeF(552.0, 220.0))

# Convert the first page to an image
with pdf.SaveAsImage(0) as imageS:

# Save the image as a PNG file
imageS.Save(output_image)

pdf.Close()

The following code explains how to crop a specific page in a PDF file and save it as an HTML file in Python:

from spire.pdf.common import *
from spire.pdf import *

# Specify the input and output file paths
input_pdf = "Test.pdf"
output_html = "CropPage.html"

# Initialize an instance of the PdfDocument class
pdf = PdfDocument()
# Load a PDF file
pdf.LoadFromFile(input_pdf)

# Get the first page by its index
page = pdf.Pages[0]

# Crop the page to the specified area
page.CropBox = RectangleF(PointF(30.0, 280.0),SizeF(552.0, 220.0))

# Initialize another instance of the PdfDocument class to create a new PDF file
new_pdf = PdfDocument()
# Insert the cropped page to the new PDF file
new_pdf.InsertPage(pdf, 0, 0)

# Save the result to an HTML file
new_pdf.SaveToFile(output_html, FileFormat.HTML)
new_pdf.Close()
pdf.Close()

Conclusion

This article explained how to crop PDF pages using Python and Spire.PDF for Python library.

The Spire.PDF library provides a wide range of other PDF manipulation capabilities, including text extraction, image extraction, table extraction, form filling, and more. Explore the library’s documentation to discover the full extent of its features.

Related Articles

--

--

Alice Yang

Skilled senior software developers with five years of experience in all phases of software development life cycle using .NET, Java and C++ languages.