Local File Operations Made Easy: Convert Images to PDFs and Merge PDFs Securely with Python.
If you are concerned about uploading files online to convert images to PDFs or merge PDFs, this article is specifically written to address your concerns. In it, I will provide you with code that allows you to perform these tasks on your local computer. This way, you can ensure the security of your files and documents without having to rely on online platforms.
AGENDA:
- Image to PDF
- PDF to Image
- Merge PDFs
- Split PDFs
- Image(s) to PDF:
#images are converted to a single pdf.
import os
import img2pdf
name = "your_pdf_name.pdf"
images = ["image1.png","image2.png", "image3.jpg"]
with open(name,"wb") as f:
f.write(img2pdf.convert(images))
file_size = os.path.getsize(name)
file_size_kb = file_size / 1024
print(f"Size of {name}: {file_size_kb:.2f} KB")
Make sure you edit the variable name and the list images as per your need. This code makes use of img2pdf
Python library which you have to install.
Installation:
pip install img2pdf
Code Explanation:
- Using a
with
statement and theopen()
function, a new PDF file is created with the specified name ("your_pdf_name.pdf"). Note that the file should be opened in “Binary format” which is specified aswb
- The
img2pdf.convert()
the function is used to convert the images listed in theimages
variable into PDF format. - The converted PDF data is then written to the newly created PDF file using the
write()
method of the opened file object (f
). - The code uses the
os.path.getsize()
function to retrieve the size of the generated PDF file. - The file size is then divided by 1024 to convert it from bytes to kilobytes and stored in the variable
file_size_kb
. - Finally, the code prints the size of the PDF file by formatting the
file_size_kb
variable and displaying it with two decimal places, along with a message.
Changes you have to make:
- Change the name of the pdf. — The
name
variable - Change the content of images (Relative path or absolute path of the image) — the
images
variable.
Note:
- Execute the program using
python <python_file_name>.py
- You can convert a single image to PDF or multiple images to PDF.
- For the command
pip install
to work, you need to have pip installed in your system. If an error occurs even if u have Python, then you can uninstall Python and install Python from www.python.org or you can install pip separately. - If you get this kind of message when executing the program,
Converting an image with an alpha channel and computing a separate soft mask (/SMask) image to store transparency in a PDF does not automatically compromise its validity or make it fraudulent. The use of transparency and soft masks is a common practice in PDF documents and image manipulation.
2. PDF to Images:
#pdf to images
# POPPLER_PATH = os.path.join(os.getcwd(),r"dependencies\poppler\Library\bin")
# The above line is only for WIndows users
from pdf2image import convert_from_path
import os
name = 'your_file_name.pdf'
pages = convert_from_path(name, 200)
i=1
for page in pages:
page.save(f'output_image{i}.jpg', 'JPEG')
file_size = os.path.getsize(f'out{i}.jpg')
file_size_kb = file_size / 1024
print(f"Size of {f'output_image{i}.jpg'}: {file_size_kb:.2f} KB")
i+=1
Installation:
pip install pdf2image
Windows and Mac Users will have to install Poppler additionally. Refer to the link below and install dependencies according to your OS.
For Windows users,
After downloading Poppler navigate to the bin folder (inside the Library folder) and copy the path. Store this path as POPPLER_PATH and pass it as the argument for the convert function. For Example,
POPPLER_PATH = r"C:\path\to\poppler-xx\library\bin"
pages = convert_from_path(name ,poppler_path = POPPLER_PATH)
Code Explanation:
- The code uses the
convert_from_path()
function from thepdf2image
module to convert the pages of the PDF file named "certs.pdf" into images. - Each page is converted with a resolution of 200 pixels.
- A loop is used to iterate through each converted image.
- Inside the loop, the
save()
method is used to save each image with a unique file name, such as "output_image1.jpg", "output_image2.jpg", etc. - The code then retrieves the size of each saved image using the
os.path.getsize()
function. - The file size is divided by 1024 to convert it from bytes to kilobytes and stored in the variable
file_size_kb
. - Within the loop, the code prints the size of each saved image by formatting the
file_size_kb
variable and displaying it with two decimal places, along with a message. - The loop continues for each image, incrementing the counter
i
each time to provide a unique name for the images.
Changes you have to make:
- Change
name
variable. — The path of the PDF file you want to convert into images. (Relative path enough).
3. Merge PDFs:
#MERGE ALL PDFS INTO SINGLE PDF
from PyPDF2 import PdfMerger
import os
merger = PdfMerger()
name = "merged_pdf_file_name.pdf"
pdfs = ['my_file1.pdf','my_file2.pdf','my_file3.pdf']
for pdf in pdfs:
merger.append(pdf)
merger.write(name)
file_size = os.path.getsize(name)
file_size_kb = file_size / 1024
print(f"Size of {name}: {file_size_kb:.2f} KB")
merger.close()
Installation:
pip install PyPDF2
Code Explanation:
- The code creates a
PdfMerger
object namedmerger
from thePyPDF2
module. This object will be used to merge multiple PDF files. - The variable
name
is set to "merged_pdf_file_name.pdf". You can change it to the desired name for your merged PDF file. - The variable
pdfs
is a list that contains the names of the input PDF files that you want to merge. - Using a loop, each PDF file listed in the
pdfs
variable is appended to themerger
object using theappend()
method. This adds the content of each PDF to the merger, combining them into a single PDF.
Other useful methods of Merger Class:
merger.merge(<position>, "pdf_file.pdf")
This method is used to merge the specified PDF document (pdf_file.pdf
) into the existing merged document at a specific position. The first argument represents the index position at which the PDF will be merged. It means that the pages of the pdf
document will be inserted after the page at the indexposition
in the merged document. All the existing pages after the specified index will be shifted to accommodate the new pages.
merger.append("pdf_file_name.pdf", pages=(0, 3)) # pages with pg.nos 1,2,3 will be added the end.
The append()
method is used to add specific pages from a PDF document (pdf_file_name.pdf) to the end of the merged document. This example pages=(0, 3)
specifies that pages with indices 0 to 3 (exclusive) from the “pdf_file_name.pdf” document will be appended to the merged document. It means that only the selected pages will be added to the end of the existing merged document without affecting the order of existing pages.
merger.append(pdf, pages=(0, 6, 2)) # pages with pgno. 1,3, 5 will be added at the end
This variant of the append()
method allows you to specify a non-contiguous range of pages to be appended. In this example, pages=(0, 6, 2)
this means that pages with indices (not page numbers) 0, 2, and 4 will be appended to the merged document. It follows a similar syntax to the range()
function in Python, where the third argument specifies the step size or interval between pages.
- The
write()
method is called on themerger
object, with thename
variable specifying the file name for the merged PDF. This writes the merged PDF content to the specified file. - The code uses the
os.path.getsize()
function to retrieve the size of the generated merged PDF file. - The file size is then divided by 1024 to convert it from bytes to kilobytes and stored in the variable
file_size_kb
. - Finally, the code prints the size of the merged PDF file by formatting the
file_size_kb
variable and displaying it with two decimal places, along with a message. - The
merger
an object is closed using theclose()
method, ensuring that any resources associated with it are released.
Changes you have to make:
- The variable
name
— Change it to the desired name for your merged PDF file. - The list variable
pdfs
— The names( If they are present in the same directory) or the paths (Relative/Absolute path) of the pdf files you want to merge.
4. Split PDFs:
# split pdfs
from PyPDF2 import PdfReader, PdfWriter
import os
def split(path, name_of_split):
pdf = PdfReader(path)
for page in range(len(pdf.pages)):
pdf_writer = PdfWriter()
pdf_writer.add_page(pdf.pages[page])
output = f'{name_of_split}{page}.pdf'
with open(output, 'wb') as output_pdf:
pdf_writer.write(output_pdf)
file_size = os.path.getsize(output)
file_size_kb = file_size / 1024
print(f"Size of {output}: {file_size_kb:.2f} KB")
if __name__ == '__main__':
path = 'path_file_to_be_splited.pdf'
name_of_the_split = "any_name"
split(path, name_of_the_split)
This code converts each individual page of the specified PDF into separate PDFs.
Installation:
pip install PyPDF2
Code Explanation:
- The code defines a function called
split
that takes two parameters:path
(the path of the PDF file to be split) andname_of_split
(a string representing the base name for the split files). - The code uses the
PdfReader
class fromPyPDF2
to read the PDF file specified by thepath
parameter. - Using a loop, the code iterates over each page of the input PDF using the
range()
function and the length ofpdf.pages
(The number of pages in the PDF). - Inside the loop, a new
PdfWriter
object is created usingPdfWriter()
. This object will be used to write each individual page of the split PDF files. - The current page is added to the
pdf_writer
using theadd_page()
method, obtained from thepdf.pages[page]
reference. - An output file name is created by appending the current page number to the
name_of_split
parameter. - The
pdf_writer
content is written to the output PDF file using awith
statement and thewrite()
method of the opened file object. - Then the size of the file is calculated and printed.
Suppose you want to split a PDF of 6 pages into 2, one having 5 pages and the other having the last page only,
from PyPDF2 import PdfReader, PdfWriter
import os
def split(path, name_of_split):
pdf = PdfReader(path)
# Split PDF into two parts: 5 pages and 1 page
pdf_part1 = PdfWriter()
for page in range(5):
pdf_part1.add_page(pdf.pages[page])
pdf_part2 = PdfWriter()
pdf_part2.add_page(pdf.pages[5])
# Write the split PDFs to separate files
output1 = f'{name_of_split}_part1.pdf'
with open(output1, 'wb') as output_pdf1:
pdf_part1.write(output_pdf1)
output2 = f'{name_of_split}_part2.pdf'
with open(output2, 'wb') as output_pdf2:
pdf_part2.write(output_pdf2)
# Determine the sizes of the split PDF files
file_size1 = os.path.getsize(output1)
file_size_kb1 = file_size1 / 1024
print(f"Size of {output1}: {file_size_kb1:.2f} KB")
file_size2 = os.path.getsize(output2)
file_size_kb2 = file_size2 / 1024
print(f"Size of {output2}: {file_size_kb2:.2f} KB")
if __name__ == '__main__':
path = 'path_file_to_be_splited.pdf'
name_of_split = "any_name"
split(path, name_of_split)
Changes you have to make:
- The path variable — The name or the path of the PDF to be split.
name_of_split
— To identify the split pages.
GitHub Repository for this:
https://github.com/SarumathyPrabakaran/pdf-file-manipulation-scripts
Hope it helps.