5 Ways to Compress PDF or Reduce PDF File Size with Python

Alice Yang
5 min readJul 3, 2024

--

Compress PDF with Python
Compress PDF with Python

Dealing with large PDF files can be a real challenge when it comes to storage, sharing, and transmission. PDF compression provides an effective solution to reduce file size, making the documents more manageable and optimizing storage usage. Compressed PDF files offer several key advantages:

  • Reduced storage costs: Smaller file sizes mean lower long-term storage requirements and expenses.
  • Improved transfer efficiency: The reduced file size enables faster upload and download speeds, particularly beneficial for sharing files via email, cloud drives, and other channels.
  • Enhanced user experience: Lightweight PDF files load and display more quickly, especially on mobile devices, while also reducing network bandwidth consumption.
  • Increased content accessibility: Compact PDF files are more easily indexed and discovered by search engines, helping to improve the visibility and reach of your content.

In this blog post, we will demonstrate how to programmatically compress PDF files using Python. We will discuss the following topics:

Python Library to Compress PDF Files

To compress PDF files in Python, we will use Spire.PDF for Python. It is a feature-rich and user-friendly library designed to create, read, edit, and convert PDF files within Python applications.

You can install Spire.PDF for Python from PyPI using the following pip command:

pip install Spire.Pdf

If you already have Spire.PDF for Python installed and would like to upgrade to the latest version, use the following pip command:

pip install --upgrade Spire.Pdf

For more detailed information about the installation, you can check this official documentation: How to Install Spire.PDF for Python in VS Code.

Compress PDF by Optimizing Images with Python

PDF files often contain a large number of high-resolution images, which can be the primary contributor to the file’s overall size. By compressing these images and reducing their resolution, you can significantly reduce the file size of the PDFs.

The PdfCompressor class in Spire.PDF for Python is responsible for compressing PDF files. By using the PdfCompressor.OptimizationOptions.SetImageQuality(), PdfCompressor.OptimizationOptions.SetResizeImages() and PdfCompressor.OptimizationOptions.SetIsCompressImage() methods, you can easily compress PDF files by optimizing images contained within them.

Here is a simple example that shows how to compress a PDF by optimizing images using Python:

from spire.pdf import *
from spire.pdf.common import *

# Create a PdfCompressor object and specify the path of the PDF file to be compressed
input_pdf = "Sample.pdf"
compressor = PdfCompressor(input_pdf)

# Configure the compression options to optimize images in the PDF
compression_options = compressor.OptimizationOptions
compression_options.SetImageQuality(ImageQuality.Medium)
compression_options.SetResizeImages(True)
compression_options.SetIsCompressImage(True)

# Compress the PDF file and save the result to a new file
output_pdf = "OptimizingImages.pdf"
compressor.CompressToFile(output_pdf)
Compress Images in PDF with Python
Compress Images in PDF with Python

Compress PDF by Optimizing Fonts with Python

Fonts embedded in PDFs can also contribute to larger file sizes. Optimizing these fonts by compressing or unembedding them can help reduce the file size.

To compress or unembedded fonts in PDF files, you can use the PdfCompressor.OptimizationOptions.SetIsCompressFonts() or PdfCompressor.OptimizationOptions.SetIsUnembedFonts() method.

Here is a simple example that shows how to compress a PDF by compressing or unembedding fonts using Python:

from spire.pdf import *
from spire.pdf.common import *

# Create a PdfCompressor object and specify the path of the PDF file to be compressed
input_pdf = "Sample.pdf"
compressor = PdfCompressor(input_pdf)

# Configure the compression options to optimize fonts in the PDF
compression_options = compressor.OptimizationOptions
# Enable font compression
compression_options.SetIsCompressFonts(True)
# Or enable font unembedding
# compression_options.SetIsUnembedFonts(True)

# Compress the PDF file and save the result to a new file
output_pdf = "OptimizingFonts.pdf"
compressor.CompressToFile(output_pdf)

Compress PDF by Removing Attachments with Python

PDF files can sometimes contain attached files, such as images, documents, or other media. These attached files can significantly increase the overall size of the PDF. By using the PdfDocument.Attachments.Clear() method, you can remove all attachments from PDF files with ease.

Here is a simple example that shows how to reduce the file size of a PDF by removing all the attachments from it using Python:

from spire.pdf import *
from spire.pdf.common import *

# Create a PdfDocument object and specify the path of the PDF file to be compressed
input_pdf = "Sample.pdf"
pdf = PdfDocument(input_pdf)
# Disable the incremental update
pdf.FileInfo.IncrementalUpdate = False

# Remove all attachments from the PDF
pdf.Attachments.Clear()

# Save the result to a new file
output_pdf = "RemovingAttachments.pdf"
pdf.SaveToFile(output_pdf)
pdf.Close()

Compress PDF by Removing or Flattening Form Fields with Python

PDF documents that contain interactive form fields, such as text boxes, checkboxes, or dropdown menus, tend to have a larger file size compared to static PDF documents. This is because the form field data and associated metadata are stored separately within the PDF file. By either removing the form fields entirely or flattening them, you can significantly reduce the file size of the PDF.

Here is a simple example that shows how to reduce the file size of a PDF by flattening form fields using Python:

from spire.pdf import *
from spire.pdf.common import *

# Create a PdfDocument object and specify the path of the PDF file to be compressed
input_pdf = "Sample.pdf"
pdf = PdfDocument(input_pdf)
# Disable the incremental update
pdf.FileInfo.IncrementalUpdate = False

# Flatten the form fields in the PDF
pdf.Form.IsFlatten = True

# Save the result to a new file
output_pdf = "FlatteningForms.pdf"
pdf.SaveToFile(output_pdf)
pdf.Close()

Here is a simple example that shows how to reduce the file size of a PDF by removing form fields using Python:

from spire.pdf import *
from spire.pdf.common import *

# Create a PdfDocument object and specify the path of the PDF file to be compressed
input_pdf = "Sample.pdf"
pdf = PdfDocument(input_pdf)
# Disable the incremental update
pdf.FileInfo.IncrementalUpdate = False

# Get the forms in the PDF
form = pdf.Form
formWidget = PdfFormWidget(form)

# Remove all forms from the PDF
formWidget.FieldsWidget.Clear()

# Save the result to a new file
output_pdf = "RemovingForms.pdf"
pdf.SaveToFile(output_pdf)
pdf.Close()

Compress PDF by Removing Annotations with Python

PDF documents can contain various types of annotations, such as comments and highlights. These annotations are stored as separate objects within the PDF file, which can contribute to an increase in the overall file size. By removing annotations from the PDF, you can effectively reduce the file size without impacting the core content of the document.

Here is a simple example that shows how to reduce the file size of a PDF by removing annotations using Python:

from spire.pdf import *
from spire.pdf.common import *

# Create a PdfDocument object and specify the path of the PDF file to be compressed
input_pdf = "Sample.pdf"
pdf = PdfDocument(input_pdf)
# Disable the incremental update
pdf.FileInfo.IncrementalUpdate = False

# Remove all annotations from the pages of the PDF
for i in range(pdf.Pages.Count):
page = pdf.Pages[i]
page.Annotations.Clear()

# Save the result to a new file
output_pdf = "RemovingAnnotations.pdf"
pdf.SaveToFile(output_pdf)
pdf.Close()

Conclusion

This blog post demonstrated 5 different ways to compress PDF files using Python. We hope you find it helpful.

Related Topics

--

--

Alice Yang

Skilled senior software developers with five years of experience in all phases of software development life cycle using .NET, Java and C++ languages.