Convert Word DOC or DOCX to PDF with Python: A Comprehensive Guide

Alice Yang
6 min readAug 14, 2023

--

Convert Word to PDF with Python
Convert Word to PDF with Python

Converting Word files to PDF format is a task frequently undertaken by a multitude of users. The allure of PDF files lies in their ability to preserve the visual aesthetics and formatting of the original Word documents while offering cross-platform compatibility across various operating systems and devices. This compatibility guarantees that your document will maintain its intended appearance for all recipients, irrespective of the software or device they utilize.

In this article, we will explore various options for converting Word DOC and DOCX files to PDF format using Python. We will cover both basic and advanced conversion options to suit different needs:

Install the Required Library

To convert Word files to PDF, we can use the Spire.Doc for Python library. which is a Word library for creating, reading, editing, and converting Word files within Python applications. With this library, you can work with a wide range of Word formats, including Doc, Docx, Docm, Dot, Dotx, Dotm, and more.

You can install Spire.Doc for Python from pypi by running the following commands in your terminal:

pip install Spire.Doc
pip install plum-dispatch==1.7.4

Convert Word to PDF in Python

Converting a Word document to PDF format allows for easy sharing, printing, and viewing of the document across different platforms and devices.

The following code snippet shows how to convert a Word DOC or DOCX file to PDF in Python:

from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()
# Load a Word DOCX file
document.LoadFromFile("Sample.docx")
# Or load a Word DOC file
#document.LoadFromFile("Sample.doc")

# Save the file to a PDF file
document.SaveToFile("WordToPdf.pdf", FileFormat.PDF)
document.Close()

Convert Word to PDF with Bookmarks in Python

Bookmarks are especially useful for lengthy documents, eBooks, or manuals as they can enhance the user’s ability to navigate through the content efficiently.

When converting Word to PDF, Word styles like headings or bookmarks can be mapped as bookmarks in PDF during conversion. The following code snippet shows how to convert a Word DOC or DOCX file to PDF with bookmarks in Python:

from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()
# Load a Word DOCX file
document.LoadFromFile("Sample.docx")
# Or load a Word DOC file
#document.LoadFromFile("Sample.doc")

# Create a ToPdfParameterList object
parameters = ToPdfParameterList()

# Create PDF bookmarks using Word bookmarks
parameters.CreateWordBookmarks = True
# Or create PDF bookmarks using Word headings
#parameters.CreateWordBookmarksUsingHeadings = True

#Save the file to a PDF file
document.SaveToFile("WordToPdfWithBookmarks.pdf", parameters)
document.Close()

Convert Word to PDF with Fonts Embedded in Python

Embedding fonts in PDF during conversion ensures the recipient sees the text as intended regardless of the fonts installed on their system.

The following code snippet shows how to convert a Word DOC or DOCX file to PDF with fonts embedded in Python:

from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()
# Load a Word DOCX file
document.LoadFromFile("Sample.docx")
# Or load a Word DOC file
#document.LoadFromFile("Sample.doc")

# Create a ToPdfParameterList object
parameters = ToPdfParameterList()

# Embed all used fonts in Word into PDF
parameters.IsEmbeddedAllFonts = True

#Save the file to a PDF file
document.SaveToFile("WordToPdfWithFontsEmbedded.pdf", parameters)
document.Close()

The above code snippet shows how to embed all used fonts in Word into the converted PDF. If you want to embed uninstalled fonts in the converted PDF, you can refer to the following code snippet:

from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()
# Load a Word DOCX file
document.LoadFromFile("Sample.docx")
# Or load a Word DOC file
#document.LoadFromFile("Sample.doc")

# Create a ToPdfParameterList object
parameters = ToPdfParameterList()

# Embed uninstalled fonts into PDF
fonts = []
fonts.append(PrivateFontPath("PT Serif Caption", "PT_Serif-Caption-Web-Regular.ttf"))
parameters.PrivateFontPaths = fonts

#Save the file to a PDF file
document.SaveToFile("WordToPdfWithFontsEmbedded.pdf", parameters)
document.Close()

Convert Word to Password-Protected PDF in Python

Adding a password to the resulting PDF is an effective solution for protecting sensitive or confidential information from unauthorized access.

The following code snippet shows how to convert a Word DOC or DOCX file to password-protected PDF in Python:

from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()
# Load a Word DOCX file
document.LoadFromFile("Sample.docx")
# Or load a Word DOC file
#document.LoadFromFile("Sample.doc")

# Create a ToPdfParameterList object
parameters = ToPdfParameterList()

# Set the open password, permission password and permissions for PDF
parameters.PdfSecurity.Encrypt("OpenPassword", "PersmissionPassword", PdfPermissionsFlags.Default, PdfEncryptionKeySize.Key128Bit)

#Save the file to a PDF file
document.SaveToFile("WordToEncryptedPdf.pdf", parameters)
document.Close()

Convert Word to PDF/A in Python

PDF/A is a specialized subset of the PDF format that is specifically designed for reliable and consistent preservation of electronic documents. Converting a Word document to PDF/A format ensures that the document remains accessible and readable after long-term preservation.

The following code snippet shows how to convert a Word DOC or DOCX file to PDF/A-1a in Python:

from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()
# Load a Word DOCX file
document.LoadFromFile("Sample.docx")
# Or load a Word DOC file
#document.LoadFromFile("Sample.doc")

# Create a ToPdfParameterList object
parameters = ToPdfParameterList()

# Set the conformance level for PDF
parameters.PdfConformanceLevel = PdfConformanceLevel.Pdf_A1A

#Save the file to a PDF file
document.SaveToFile("WordToPdfA.pdf", parameters)
document.Close()

In addition to the PDF/A-1a conformance level, you have the flexibility to choose from a range of other conformance levels when converting your Word files to PDF format. These conformance levels include PDF/A-1b, PDF/A-2a, PDF/A-2b, PDF/A-3a, PDF/A-3b, and PDF/X-1a:2001.

Disable Hyperlinks when Converting Word to PDF in Python

Disabling active hyperlinks when converting Word to PDF can help you avoid clutter in the resulting PDF and provides a cleaner, more focused reading experience for recipients.

The following code snippet shows how to disable hyperlinks when converting a Word DOC or DOCX file to PDF in Python:

from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()
# Load a Word DOCX file
document.LoadFromFile("Sample.docx")
# Or load a Word DOC file
#document.LoadFromFile("Sample.doc")

# Create a ToPdfParameterList object
parameters = ToPdfParameterList()

# Disable the hyperlinks in PDF
parameters.DisableLink = True

#Save the file to a PDF file
document.SaveToFile("WordToPdfWithoutHyperlinks.pdf", parameters)
document.Close()

Adjust Image Quality when Converting Word to PDF in Python

During the conversion process from Word to PDF, you have the option to adjust the quality of images in the resulting PDF file. This allows you to optimize the PDF for different intended uses, such as reducing the overall file size or preparing the file for web publishing.

The following code snippet shows how to adjust image quality when converting a Word DOC or DOCX file to PDF in Python:

from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()
# Load a Word DOCX file
document.LoadFromFile("Sample.docx")
# Or load a Word DOC file
#document.LoadFromFile("Sample.doc")

#Set the image quality in PDF to 40% of the original. The default image quality is 80% of the original.
document.JPEGQuality = 40

#Save the file to a PDF file
document.SaveToFile("WordToPdfWithSpecificImageQuality.pdf", FileFormat.PDF)
document.Close()

Conclusion

In this article, we have outlined 7 different scenarios for converting Word files to PDF format with Python. We hope you can find it helpful.

Related Topics

--

--

Alice Yang

Skilled senior software developers with five years of experience in all phases of software development life cycle using .NET, Java and C++ languages.