Split PDF Files or Pages with Python

Alice Yang
4 min readOct 13, 2023

--

Split a PDF into Multiple PDFs with Python
Split a PDF into Multiple PDFs with Python

Splitting a PDF file into multiple smaller files can be a useful and practical task in various situations. Whether you want to extract specific pages, separate chapters or sections, or divide a large document into more manageable parts, splitting PDFs allows for better organization and easier sharing. In this article, we will explore how to split PDF files using Python.

We’ll discuss the following topics:

Python Library to Split PDF Files

To Split PDF files with Python, we can use the Spire.PDF for Python library.

Spire.PDF for Python is a feature-rich and user-friendly library that enables creating, reading, editing, and converting PDF files within Python applications. With this library, you can perform a wide range of manipulations on PDFs, including adding text or images, extracting text or images, adding digital signatures, adding or deleting pages, merging or splitting PDFs, creating bookmarks, adding text or image watermarks, inserting fillable forms and many more. In addition, you are also able to convert PDF files to various file formats, such as Word, Excel, images, HTML, SVG, XPS, OFD, PCL, and PostScript.

You can install Spire.PDF for Python from PyPI using the following pip command:

pip install Spire.Pdf

For more detailed information about the installation, you can check this official documentation: How to Install Spire.PDF for Python in VS Code.

Split a PDF File by Each Page with Python

Splitting a PDF file by each page refers to splitting a PDF file into multiple separate files, with each file containing only one page from the original PDF. It is useful when you need to extract individual pages from a PDF document and save them as separate files.

Here is a simple code example that shows how to split a PDF file by each page using Python and Spire.PDF for Python:

from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
pdf = PdfDocument()
# Load a PDF file
pdf.LoadFromFile("Sample.pdf")

# Split the PDF file into multiple PDF files, with each file containing only one page from the original PDF
pdf.Split("SplitByEachPage-{0}.pdf", 1)

# Close the PdfDocument object
pdf.Close()

Split a PDF File by Specific Page Ranges with Python

In addition to splitting the PDF file by each page, you are also able to select specific page ranges and extract them into separate files. This is particularly useful when you want to work with specific sections or chapters of a large PDF document separately.

Here is a simple code example that shows how to split a PDF file by page ranges using Python and Spire.PDF for Python:

from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
pdf = PdfDocument()
# Load a PDF file
pdf.LoadFromFile("Sample.pdf")

# Create two new PdfDocument objects
newPdf_1 = PdfDocument()
newPdf_2 = PdfDocument()

# Insert pages 1-3 of the source file into the first PDF file
newPdf_1.InsertPageRange(pdf, 0, 2)
# Insert the rest pages of the source file into the second PDF file
newPdf_2.InsertPageRange(pdf, 3, pdf.Pages.Count - 1)

# Save the resulting files
newPdf_1.SaveToFile("SplitByPageRange-1.pdf")
newPdf_2.SaveToFile("SplitByPageRange-2.pdf")

# Close the PdfDocument objects
pdf.Close()
newPdf_1.Close()
newPdf_2.Close()

Split a PDF Page into Multiple Pages with Python

In certain cases, you may need to split a specific page of a PDF document into two or more pages. For example, if a PDF page contains a large table that doesn’t fit within the standard page size, you can split it into multiple smaller pages, each containing a portion of the original content.

Here is a simple code example that shows how to split a page of a PDF file horizontally into two smaller pages using Python and Spire.PDF for Python:

from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
pdf = PdfDocument()
# Load a PDF file
pdf.LoadFromFile("Sample.pdf")

# Get the first page
page = pdf.Pages[0]

# Create a new PdfDocument object
newPdf = PdfDocument()
# Remove all its page margins
newPdf.PageSettings.Margins.All = 0.0

# Set the page width of the new PDF to be the same as the page width of the original PDF
newPdf.PageSettings.Width = page.Size.Width
# Set the page height of the new PDF to half the page height of the original PDF
newPdf.PageSettings.Height = page.Size.Height / float(2)

# Add a page to the new PDF
newPage = newPdf.Pages.Add()

# Set layout format
format = PdfTextLayout()
format.Break = PdfLayoutBreakType.FitPage
format.Layout = PdfLayoutType.Paginate

# Draw the content of the first page on the page of the new PDF
page.CreateTemplate().Draw(newPage, PointF(0.0, 0.0), format)

# Save the resulting file
newPdf.SaveToFile("SplitPage.pdf")

# Close the PdfDocument object
pdf.Close()
newPdf.Close()

Conclusion

This article demonstrated different scenarios for splitting PDF files using Python. We hope you can find it helpful.

Related Topics

Merge PDF Files or Pages into One with Python

Convert PDF to Images (PNG, JPG, BMP, EMF) with Python

Convert PDF to Word DOCX or DOC with Python

Read or Extract Text from PDF with Python — A Comprehensive Guide

--

--

Alice Yang

Skilled senior software developers with five years of experience in all phases of software development life cycle using .NET, Java and C++ languages.