Convert Word to HTML and Vice Versa in Python

Alexander Stock
3 min readDec 1, 2023

--

HTML (Hypertext Markup Language) is primarily used for web pages, while Word documents (such as .doc or .docx files) are commonly employed for text-based documents. Conversion between these two popular file formats has a variety of uses, including sharing information across platforms, publishing content online, or maintaining consistency between web pages and offline documents. In this article, I am going to introduce how to convert Word to HTML and vice versa in Python using Spire.Doc for Python library.

  • Convert Word to HTML in Python
  • Convert a HTML File to Word in Python
  • Convert a HTML String to Word in Python

Install Dependency

This solution requires Spire.Doc for Python to be installed as a dependency, which is a Python library for reading, creating and manipulating Word documents in a Python program. You can install Spire.Doc for Python by executing the following pip command.

pip install Spire.Doc

P.S. Spire.Doc for Python is a commercial library that requires a paid license. There is a red watermark in the generated document if no license is applied.

Convert Word to HTML in Python

Spire.Doc for Python provides the Document class to represent a Word document model. You can use the Document.LoadFromFile method to load an existing Word document, and save it as a different format file like HTML using the Document.SaveToFile method. Before you convert Word to HTML, you can also specify the convert options through the Document.HtmlExportOptions object. These options include whether to write CSS styles in HTML code or export them as an external file, and whether to embed images in HTML code.

from spire.doc import *
from spire.doc.common import *

# Create a Document instance
document = Document()

# Load a Word document
document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")

# Set the type of CSS style sheet as internal
document.HtmlExportOptions.CssStyleSheetType = CssStyleSheetType.Internal

# Embed images in HTLM code
document.HtmlExportOptions.ImageEmbedded = True

# Export form fields as plain text
document.HtmlExportOptions.IsTextInputFormFieldAsText = True

# Save the document as an HTML file
document.SaveToFile("output/ToHtml.html", FileFormat.Html)
document.Close()

Convert a HTML File to Word in Python

The Document.LoadFromFile method supports loading not only Doc or Docx files, but also HTML files. You can load a HTML file using this method and save it as a Word file using the Document.SaveToFile() method.

from spire.doc import *
from spire.doc.common import *

# Create an object of the Document class
document = Document()

# Load an HTML file
document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.html", FileFormat.Html, XHTMLValidationType.none)

# Save the HTML file to a .docx file
document.SaveToFile("output/ToWord.docx", FileFormat.Docx2016)
document.Close()

Convert a HTML String to Word in Python

Spire.Doc for Python offers the Paragraph.AppendHTML method, allowing users to render uncomplicated HTML strings (usually text and its formatting) on Word pages. The following code snippet gives you an example.

from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()

# Add a section to the document
sec = document.AddSection()

# Add a paragraph to the section
paragraph = sec.AddParagraph()

# Specify the HTML string
htmlString = """
<html>
<head>
<title>HTML to Word Example</title>
<style>
body {
font-family: Arial, sans-serif;
}
h1 {
color: #FF5733;
font-size: 24px;
margin-bottom: 20px;
}
p {
color: #333333;
font-size: 16px;
margin-bottom: 10px;
}
ul {
list-style-type: disc;
margin-left: 20px;
margin-bottom: 15px;
}
li {
font-size: 14px;
margin-bottom: 5px;
}
table {
border-collapse: collapse;
width: 100%;
margin-bottom: 20px;
}
th, td {
border: 1px solid #CCCCCC;
padding: 8px;
text-align: left;
}
th {
background-color: #F2F2F2;
font-weight: bold;
}
td {
color: #0000FF;
}
</style>
</head>
<body>
<h1>This is a Heading</h1>
<p>This is a paragraph.</p>
<p>Here's an unordered list:</p>
<ul>
<li>Item 1</li>
<li>Item 2</li>
<li>Item 3</li>
</ul>
<p>And here's a table:</p>
<table>
<tr>
<th>Name</th>
<th>Age</th>
<th>Gender</th>
</tr>
<tr>
<td>John Smith</td>
<td>35</td>
<td>Male</td>
</tr>
<tr>
<td>Jenny Garcia</td>
<td>27</td>
<td>Female</td>
</tr>
</table>
</body>
</html>
"""

# Append the HTML string to the paragraph
paragraph.AppendHTML(htmlString)

# Save the result document
document.SaveToFile("output/HtmlStringToWord.docx", FileFormat.Docx2016)
document.Close()

--

--

Alexander Stock

I'm Alexander Stock, a software development consultant and blogger with 10+ years' experience. Specializing in office document tools and knowledge introduction.