AI image of a pike

Create a PDF from a PDF using pikepdf

David W. Agler
4 min readNov 2, 2023

--

A really common task is to extract a sequence of pages from a pdf. For example, I might want to get 10 pages from 500 page pdf. You can use an Adobe tool, but I prefer pikepdf for several reasons. In this article, I’ll show a simple example where I extract a sequence of pages, e.g., 1–10, from a larger PDF, and then save that sequence as a new PDF.

First, let’s install pikepdf using pip:

pip install pikepdf

Next, let’s import Pdf from pikepdf into our Python file. We only need the Pdf class. For more on this, see pikepdf PDF class.

from pikepdf import Pdf

Let’s specify the file name we want to extract pages from. In my case, it is a PDF of some course handouts. In addition, let’s open this file:

pdf_file = "handouts.pdf"
pdf = Pdf.open(pdf_file)

To test whether everything is working correctly, I’ll print the number of pages. To do this, I’ll use the pages property. Printing this will return information about the length of the pdf.

print(pdf.pages)
# <pikepdf._core.PageList len=52>

Using pdf.pages we can do several operations, e.g., appending, inserting, and deleting pages. In our case, we only want to create a new file that appends a range of pages. To this end, we want two vars start_page and an end_page. Let’s write a little something to get the pages we want.

pages_needed = input("What pages you want? (e.g., 3-10): ")
start_page, end_page = map(int, pages_needed.split("-"))
start_page -=1

Here we create a variable that takes user input (our range of pages). It returns a string, so we redefine the string in terms of two integers: a start_page and an end_page. We also don’t want the hyphen, which is why we use split as this splits the input string into a list ["3", "10"] of strings. The map function applies the int function to each one of the strings in the list.

We then decrement the start_page by -1 to account for 0 indexing in Python. Alternatively, we could just assign the varsstart_page and end_page the page values.

Now let’s create a new pdf using Pdf.new().

pdf_ext = Pdf.new()

Let’s use a for loop over the range starting with our start_page and terminating with our end_page. For each one of these pages, let’s append the page of the pdf we are extracting from to the page of the pdf we are creating.

for i in range(start_page,end_page):
pdf_ext.pages.append(pdf.pages[i])

Finally, let’s get user input to get a name for this pdf and then save it using an f string that has the name_start_page_end_page. We will increment the start_pages var so the page numbers are correct in the file name.

pdf_out_name = input("Enter the name of the output file: ")
start_page +=1
pdf_ext.save(f"{pdf_out_name}_{start_page}_{end_page}.pdf")

So, if our original file is handouts.pdf, and we want the first 10 pages and name the new file handout_selection , the code will produce a file named “handout_selection_1_10.pdf” that consists of the first ten pages of handouts.pdf

Code in its entirety:

from pikepdf import Pdf

pdf_file = "handouts.pdf"
pdf = Pdf.open(pdf_file)
print(pdf.pages)

pages_needed = input("What pages you want? (e.g., 3-10): ")
start_page, end_page = map(int, pages_needed.split("-"))
start_page -=1

pdf_ext = Pdf.new()
for i in range(start_page,end_page):
pdf_ext.pages.append(pdf.pages[i])
pdf_out_name = input("Enter the name of the output file: ")
start_page +=1
pdf_ext.save(f"{pdf_out_name}_{start_page}_{end_page}.pdf")
pdf.close()

Extracting Non-sequential pages

But wait! In the previous example, we examined how to create a file that extracts a sequential set of pages from a pdf (e.g., 1–10). What if we want a non-sequential set of pages (e.g., 1–10, 15, 20–21, 50)?

The first thing we’ll do is modify this piece of code:

pages_needed = input("What pages you want? (e.g., 3-10): ")
start_page, end_page = map(int, pages_needed.split("-"))

We will get the user input again, but this time use split to get a list of strings of sequential pages. We’ll split at the comma:

pages_needed = input("What pages you want? (e.g., 1-10,15-17,19,20): ")
pages_needed = pages_needed.split(",")

So, if we were to type “1–10,12–15,18” as our input, our pages_needed var would be the following list of strings: [‘1–10’, ‘12–15’, ‘18’]. Next, we’ll create a new pdf and use a for loop over our pages_needed list. We’ll say that if that list contains a hyphen, define a start_page and end_page as two integers. We’ll then use another for loop to append that range of pages to our new pdf. We decrement the start_page in the for loop this time:


pdf_ext = Pdf.new()
for pages in pages_needed:
if "-" in pages:
start_page, end_page = map(int, pages.split("-"))
for i in range(start_page-1, end_page):
pdf_ext.pages.append(pdf.pages[i])

This accounts for items in the list of pages that are an unbroken sequence. What about for single pages? If an item in doesn’t contain a hyphen, then it is just a single page that we want. We can handle this with an else:

    else:
pdf_ext.pages.append(pdf.pages[int(pages)-1])

In the above, we are simply appending that single page. Now, just as before, we want to name our file. Since we likely don’t want a file that looks like this extracted_handout_pages_1_10_12_15_18 I’ll modify the name of the output file:

pdf_out_name = input("Enter the name of the output file: ")
pdf_ext.save(f"{pdf_out_name}.pdf")

Here is the entire code:

from pikepdf import Pdf

pdf_file = "handouts.pdf"
pdf = Pdf.open(pdf_file)
print(pdf.pages)

pages_needed = input("What pages you want? (e.g., 1-10,15-17,19,20): ")
pages_needed = pages_needed.split(",")

pdf_ext = Pdf.new()
for pages in pages_needed:
if "-" in pages:
start_page, end_page = map(int, pages.split("-"))
for i in range(start_page-1, end_page):
pdf_ext.pages.append(pdf.pages[i])
else:
pdf_ext.pages.append(pdf.pages[int(pages)-1])

pdf_out_name = input("Enter the name of the output file: ")
pdf_ext.save(f"{pdf_out_name}.pdf")
pdf.close()

Resources

  1. pikepdf Documentation: https://pikepdf.readthedocs.io/
  2. pikepdf Tutorial: https://pikepdf.readthedocs.io/en/latest/tutorial.html
  3. pikepdf Github: https://github.com/pikepdf/pikepdf

--

--

David W. Agler
David W. Agler

Written by David W. Agler

Assistant Teaching Professor - Philosophy. I make logic and philosophy videos at https://www.youtube.com/@LogicPhilosophy

No responses yet