Convert PDF to Excel with Python

Alice Yang
3 min readOct 25, 2023

--

Convert PDF to Excel with Python
Convert PDF to Excel with Python

PDF files are widely used for sharing and viewing documents, but when it comes to manipulating and analyzing data, Excel provides a more flexible and powerful platform. There are numerous situations where converting PDF to Excel becomes indispensable. For instance, you may receive a PDF report or financial statement that requires further analysis or necessitates the extraction of specific data. By converting these files into Excel format, you can effortlessly manipulate the data, perform calculations, and create interactive charts or graphs to uncover valuable insights. In this article, we will explore how to convert PDF files to Excel format using Python.

We’ll discuss the following topics:

Python Library to Convert PDF to Excel

To convert PDF files to Excel with Python, we can use the Spire.PDF for Python library.

Spire.PDF for Python is a feature-rich and user-friendly library that enables creating, reading, editing, and converting PDF files within Python applications. With this library, you can perform a wide range of manipulations on PDFs, including adding text or images, extracting text or images, adding digital signatures, adding or deleting pages, merging or splitting PDFs, creating bookmarks, adding text or image watermarks, inserting fillable forms and many more. In addition, you are also able to convert PDF files to various file formats, such as Word, Excel, images, HTML, SVG, XPS, OFD, PCL, and PostScript.

You can install Spire.PDF for Python from PyPI using the following pip command:

pip install Spire.Pdf

For more detailed information about the installation, you can check this official documentation: How to Install Spire.PDF for Python in VS Code.

Convert PDF to Excel in Python

Converting a PDF file to Excel format is very straightforward with Spire.PDF for Python. You just need 3 lines of code.

Here is a simple example that shows how to convert a PDF file to Excel format using Python and Spire.PDF for Python:

from spire.pdf.common import *
from spire.pdf import *

# Create an object of the PdfDocument class
pdf = PdfDocument()
# Load a PDF file
pdf.LoadFromFile("Sample.pdf")

# Save the PDF file to Excel XLSX format
pdf.SaveToFile("ToExcel.xlsx", FileFormat.XLSX)
# Close the PdfDocument object
pdf.Close()

Convert a Multi-Page PDF to a Single Excel Worksheet in Python

When dealing with multi-page PDFs containing tabular data, it can be beneficial to consolidate the information into a single Excel worksheet for easier analysis and manipulation.

With Spire.PDF for Python, you can convert PDFs to Excel while customizing the conversion options to suit specific requirements. For example, you can specify whether to convert a multi-page PDF to multiple Excel worksheets or a single worksheet, whether to wrap text in cells of the converted Excel, and whether to display overlapping text. Here is a simple example:

from spire.pdf.common import *
from spire.pdf import *

# Create an object of the PdfDocument class
pdf = PdfDocument()
# Load a PDF file
pdf.LoadFromFile("Sample.pdf")

# Create an XlsxLineLayoutOptions object to specify the conversion options
'''
Parameters of the XlsxLineLayoutOptions constructor: convertToMultipleSheet, rotatedText, splitCell, wrapText, overlapText
convertToMultipleSheet: Indicates whether to convert multi-page PDF to multiple Excel sheets or a single Excel sheet
rotatedText: Indicates whether to show rotated text
splitCell: Indicates whether a PDF table cell containing text spanning multiple lines will be split into multiple rows in Excel
wrapText: Indicates whether to wrap text in an Excel cell
overlapText: Indicates whether to display overlapping text
'''
convertOptions = XlsxLineLayoutOptions(False, True, False, True, False)

# Set the conversion options
pdf.ConvertOptions.SetPdfToXlsxOptions(convertOptions)

# Save the PDF file to Excel XLSX format
pdf.SaveToFile("ToExcelWithAdvancedOptions.xlsx", FileFormat.XLSX)
# Close the PdfDocument object
pdf.Close()

Conclusion

This article demonstrated how to convert PDF to Excel using Python. We hope you find it helpful.

Related Topics

Convert PDF to Images (PNG, JPG, BMP, EMF) with Python

Convert PDF to Word DOCX or DOC with Python

Read or Extract Text from PDF with Python — A Comprehensive Guide

Merge PDF Files or Pages into One with Python

Split PDF Files or Pages with Python

Add Watermarks to PDF with Python

--

--

Alice Yang

Skilled senior software developers with five years of experience in all phases of software development life cycle using .NET, Java and C++ languages.