Local File Operations Made Easy: Convert Images to PDFs and Merge PDFs Securely with Python.

Sarumathy P
featurepreneur
Published in
7 min readMay 29, 2023

If you are concerned about uploading files online to convert images to PDFs or merge PDFs, this article is specifically written to address your concerns. In it, I will provide you with code that allows you to perform these tasks on your local computer. This way, you can ensure the security of your files and documents without having to rely on online platforms.

AGENDA:

  • Image to PDF
  • PDF to Image
  • Merge PDFs
  • Split PDFs
  1. Image(s) to PDF:
#images are converted to a single pdf.

import os
import img2pdf

name = "your_pdf_name.pdf"
images = ["image1.png","image2.png", "image3.jpg"]

with open(name,"wb") as f:
f.write(img2pdf.convert(images))


file_size = os.path.getsize(name)
file_size_kb = file_size / 1024
print(f"Size of {name}: {file_size_kb:.2f} KB")

Make sure you edit the variable name and the list images as per your need. This code makes use of img2pdf Python library which you have to install.

Installation:

pip install img2pdf

Code Explanation:

  • Using a with statement and the open() function, a new PDF file is created with the specified name ("your_pdf_name.pdf"). Note that the file should be opened in “Binary format” which is specified as wb
  • The img2pdf.convert() the function is used to convert the images listed in the images variable into PDF format.
  • The converted PDF data is then written to the newly created PDF file using the write() method of the opened file object (f).
  • The code uses the os.path.getsize() function to retrieve the size of the generated PDF file.
  • The file size is then divided by 1024 to convert it from bytes to kilobytes and stored in the variable file_size_kb.
  • Finally, the code prints the size of the PDF file by formatting the file_size_kb variable and displaying it with two decimal places, along with a message.

Changes you have to make:

  • Change the name of the pdf. — The name variable
  • Change the content of images (Relative path or absolute path of the image) — the images variable.

Note:

  • Execute the program using
python <python_file_name>.py
  • You can convert a single image to PDF or multiple images to PDF.
  • For the command pip install to work, you need to have pip installed in your system. If an error occurs even if u have Python, then you can uninstall Python and install Python from www.python.org or you can install pip separately.
  • If you get this kind of message when executing the program,

Converting an image with an alpha channel and computing a separate soft mask (/SMask) image to store transparency in a PDF does not automatically compromise its validity or make it fraudulent. The use of transparency and soft masks is a common practice in PDF documents and image manipulation.

2. PDF to Images:

#pdf to images

# POPPLER_PATH = os.path.join(os.getcwd(),r"dependencies\poppler\Library\bin")
# The above line is only for WIndows users


from pdf2image import convert_from_path
import os

name = 'your_file_name.pdf'
pages = convert_from_path(name, 200)

i=1
for page in pages:
page.save(f'output_image{i}.jpg', 'JPEG')

file_size = os.path.getsize(f'out{i}.jpg')
file_size_kb = file_size / 1024
print(f"Size of {f'output_image{i}.jpg'}: {file_size_kb:.2f} KB")
i+=1

Installation:

pip install pdf2image

Windows and Mac Users will have to install Poppler additionally. Refer to the link below and install dependencies according to your OS.

For Windows users,

After downloading Poppler navigate to the bin folder (inside the Library folder) and copy the path. Store this path as POPPLER_PATH and pass it as the argument for the convert function. For Example,


POPPLER_PATH = r"C:\path\to\poppler-xx\library\bin"
pages = convert_from_path(name ,poppler_path = POPPLER_PATH)

Code Explanation:

  • The code uses the convert_from_path() function from the pdf2image module to convert the pages of the PDF file named "certs.pdf" into images.
  • Each page is converted with a resolution of 200 pixels.
  • A loop is used to iterate through each converted image.
  • Inside the loop, the save() method is used to save each image with a unique file name, such as "output_image1.jpg", "output_image2.jpg", etc.
  • The code then retrieves the size of each saved image using the os.path.getsize() function.
  • The file size is divided by 1024 to convert it from bytes to kilobytes and stored in the variable file_size_kb.
  • Within the loop, the code prints the size of each saved image by formatting the file_size_kb variable and displaying it with two decimal places, along with a message.
  • The loop continues for each image, incrementing the counter i each time to provide a unique name for the images.

Changes you have to make:

  1. Change name variable. — The path of the PDF file you want to convert into images. (Relative path enough).

3. Merge PDFs:


#MERGE ALL PDFS INTO SINGLE PDF



from PyPDF2 import PdfMerger
import os

merger = PdfMerger()

name = "merged_pdf_file_name.pdf"

pdfs = ['my_file1.pdf','my_file2.pdf','my_file3.pdf']

for pdf in pdfs:
merger.append(pdf)

merger.write(name)


file_size = os.path.getsize(name)
file_size_kb = file_size / 1024
print(f"Size of {name}: {file_size_kb:.2f} KB")


merger.close()

Installation:

pip install PyPDF2

Code Explanation:

  • The code creates a PdfMerger object named merger from the PyPDF2 module. This object will be used to merge multiple PDF files.
  • The variable name is set to "merged_pdf_file_name.pdf". You can change it to the desired name for your merged PDF file.
  • The variable pdfs is a list that contains the names of the input PDF files that you want to merge.
  • Using a loop, each PDF file listed in the pdfs variable is appended to the merger object using the append() method. This adds the content of each PDF to the merger, combining them into a single PDF.

Other useful methods of Merger Class:

merger.merge(<position>, "pdf_file.pdf")

This method is used to merge the specified PDF document (pdf_file.pdf) into the existing merged document at a specific position. The first argument represents the index position at which the PDF will be merged. It means that the pages of the pdf document will be inserted after the page at the indexpositionin the merged document. All the existing pages after the specified index will be shifted to accommodate the new pages.

merger.append("pdf_file_name.pdf", pages=(0, 3))  # pages with pg.nos 1,2,3 will be added the end.

The append() method is used to add specific pages from a PDF document (pdf_file_name.pdf) to the end of the merged document. This example pages=(0, 3) specifies that pages with indices 0 to 3 (exclusive) from the “pdf_file_name.pdf” document will be appended to the merged document. It means that only the selected pages will be added to the end of the existing merged document without affecting the order of existing pages.

merger.append(pdf, pages=(0, 6, 2)) # pages with pgno. 1,3, 5 will be added at the end

This variant of the append() method allows you to specify a non-contiguous range of pages to be appended. In this example, pages=(0, 6, 2) this means that pages with indices (not page numbers) 0, 2, and 4 will be appended to the merged document. It follows a similar syntax to the range() function in Python, where the third argument specifies the step size or interval between pages.

  • The write() method is called on the merger object, with the name variable specifying the file name for the merged PDF. This writes the merged PDF content to the specified file.
  • The code uses the os.path.getsize() function to retrieve the size of the generated merged PDF file.
  • The file size is then divided by 1024 to convert it from bytes to kilobytes and stored in the variable file_size_kb.
  • Finally, the code prints the size of the merged PDF file by formatting the file_size_kb variable and displaying it with two decimal places, along with a message.
  • The merger an object is closed using the close() method, ensuring that any resources associated with it are released.

Changes you have to make:

  1. The variable name — Change it to the desired name for your merged PDF file.
  2. The list variable pdfs — The names( If they are present in the same directory) or the paths (Relative/Absolute path) of the pdf files you want to merge.

4. Split PDFs:

# split pdfs

from PyPDF2 import PdfReader, PdfWriter
import os

def split(path, name_of_split):
pdf = PdfReader(path)
for page in range(len(pdf.pages)):

pdf_writer = PdfWriter()
pdf_writer.add_page(pdf.pages[page])

output = f'{name_of_split}{page}.pdf'
with open(output, 'wb') as output_pdf:
pdf_writer.write(output_pdf)

file_size = os.path.getsize(output)
file_size_kb = file_size / 1024
print(f"Size of {output}: {file_size_kb:.2f} KB")



if __name__ == '__main__':
path = 'path_file_to_be_splited.pdf'
name_of_the_split = "any_name"
split(path, name_of_the_split)

This code converts each individual page of the specified PDF into separate PDFs.

Installation:

pip install PyPDF2

Code Explanation:

  • The code defines a function called split that takes two parameters: path (the path of the PDF file to be split) and name_of_split (a string representing the base name for the split files).
  • The code uses the PdfReader class from PyPDF2 to read the PDF file specified by the path parameter.
  • Using a loop, the code iterates over each page of the input PDF using the range() function and the length of pdf.pages (The number of pages in the PDF).
  • Inside the loop, a new PdfWriter object is created using PdfWriter(). This object will be used to write each individual page of the split PDF files.
  • The current page is added to the pdf_writer using the add_page() method, obtained from the pdf.pages[page] reference.
  • An output file name is created by appending the current page number to the name_of_split parameter.
  • The pdf_writer content is written to the output PDF file using a with statement and the write() method of the opened file object.
  • Then the size of the file is calculated and printed.

Suppose you want to split a PDF of 6 pages into 2, one having 5 pages and the other having the last page only,

from PyPDF2 import PdfReader, PdfWriter
import os

def split(path, name_of_split):
pdf = PdfReader(path)

# Split PDF into two parts: 5 pages and 1 page
pdf_part1 = PdfWriter()
for page in range(5):
pdf_part1.add_page(pdf.pages[page])

pdf_part2 = PdfWriter()
pdf_part2.add_page(pdf.pages[5])

# Write the split PDFs to separate files
output1 = f'{name_of_split}_part1.pdf'
with open(output1, 'wb') as output_pdf1:
pdf_part1.write(output_pdf1)

output2 = f'{name_of_split}_part2.pdf'
with open(output2, 'wb') as output_pdf2:
pdf_part2.write(output_pdf2)

# Determine the sizes of the split PDF files
file_size1 = os.path.getsize(output1)
file_size_kb1 = file_size1 / 1024
print(f"Size of {output1}: {file_size_kb1:.2f} KB")

file_size2 = os.path.getsize(output2)
file_size_kb2 = file_size2 / 1024
print(f"Size of {output2}: {file_size_kb2:.2f} KB")

if __name__ == '__main__':
path = 'path_file_to_be_splited.pdf'
name_of_split = "any_name"
split(path, name_of_split)

Changes you have to make:

  1. The path variable — The name or the path of the PDF to be split.
  2. name_of_split — To identify the split pages.

GitHub Repository for this:

https://github.com/SarumathyPrabakaran/pdf-file-manipulation-scripts

Hope it helps.

--

--