Add or Extract Headings in Word Documents with Python

Alice Yang
4 min readApr 2, 2024

--

When working on lengthy and complex Word documents, organizing the content becomes crucial for readability and navigability. Headings play a vital role in structuring your document, making it easier for readers to find specific sections and comprehend the overall flow of information. In this article, we will explore how to add and extract headings in Word documents using Python.

Python Library to Add and Extract Headings in Word Documents

To add and extract headings in Word documents with Python, we can use the Spire.Doc for Python library.

Spire.Doc for Python is a feature-rich and easy-to-use library for creating, reading, editing, and converting Word files within Python applications. With this library, you can work with a wide range of Word formats, including Doc, Docx, Docm, Dot, Dotx, Dotm, and more. Moreover, you are also able to render Word documents to other types of file formats, such as PDF, RTF, HTML, Text, Image, SVG, ODT, PostScript, PCL, and XPS.

You can install Spire.Doc for Python from PyPI by running the following command in your terminal:

pip install Spire.Doc

For more detailed information about the installation, you can check this official documentation: How to Install Spire.Doc for Python in VS Code.

Add Headings in Word Documents with Python

The simplest way to add headings to a Word document is to use built-in heading styles. In Microsoft Word, you can typically have up to nine levels of headings, ranging from Heading 1 to Heading 9.

Spire.Doc for Python supports all these levels of headings. Here is a simple example that shows how to add headings to a Word document using Python and Spire.Doc for Python:

from spire.doc import *
from spire.doc.common import *

# Create a Document instance
document = Document()
# Add a section
section = document.AddSection()

# Set page margins
section.PageSetup.Margins.All = 72

# Add paragraphs to the section and apply heading styles to the paragraphs
# Heading 1
paragraph = section.AddParagraph()
paragraph.Text = "Heading 1"
paragraph.ApplyStyle(BuiltinStyle.Heading1)

# Heading 2
paragraph = section.AddParagraph()
paragraph.Text = "Heading 2"
paragraph.ApplyStyle(BuiltinStyle.Heading2)

# Heading 3
paragraph = section.AddParagraph()
paragraph.Text = "Heading 3"
paragraph.ApplyStyle(BuiltinStyle.Heading3)

# Heading 4
paragraph = section.AddParagraph()
paragraph.Text = "Heading 4"
paragraph.ApplyStyle(BuiltinStyle.Heading4)

# Heading 5
paragraph = section.AddParagraph()
paragraph.Text = "Heading 5"
paragraph.ApplyStyle(BuiltinStyle.Heading5)

# Heading 6
paragraph = section.AddParagraph()
paragraph.Text = "Heading 6"
paragraph.ApplyStyle(BuiltinStyle.Heading6)

# Heading 7
paragraph = section.AddParagraph()
paragraph.Text = "Heading 7"
paragraph.ApplyStyle(BuiltinStyle.Heading7)

# Heading 8
paragraph = section.AddParagraph()
paragraph.Text = "Heading 8"
paragraph.ApplyStyle(BuiltinStyle.Heading8)

# Heading 9
paragraph = section.AddParagraph()
paragraph.Text = "Heading 9"
paragraph.ApplyStyle(BuiltinStyle.Heading9)

# Save the result file
document.SaveToFile("AddHeadings.docx", FileFormat.Docx2016)
document.Close()
Add Headings to Word Document with Python
Add Headings to Word Document with Python

Extract Headings from Word Documents with Python

Headings provide a structured outline of the document’s content, making it easier to navigate and understand the overall structure. By extracting headings, you can create summaries, table of contents, or perform further analysis on the structure and content of your documents.

Here is a simple example that shows how to extract all headings from a Word document using Python and Spire.Doc for Python:

from spire.doc import *
from spire.doc.common import *

# Create a Document instance
document = Document()
# Load a Word document containing headings
document.LoadFromFile("AddHeadings.docx")

# Create a list to store the extracted headings
headings = []

# Iterate through all sections in the document
for i in range(document.Sections.Count):
section = document.Sections[i]
# Iterate through all paragraphs in each section
for j in range(section.Paragraphs.Count):
paragraph = section.Paragraphs[j]
# Check if the style name of the paragraph contains Heading
if paragraph.StyleName is not None and "Heading" in paragraph.StyleName:
# Get the text of the paragraph and append it to the list
headings.append(paragraph.Text)

# Save the content of the list to a text file
with open("ExtractedHeadings.txt", "w") as file:
for heading in headings:
file.write(heading + "\n")

document.Close()
Extract Headings from Word Document with Python
Extract Headings from Word Document with Python

The above code shows how to extract all headings from a Word document. If you wish to extract specific headings, such as Heading 1, you can utilize the following code snippet:

from spire.doc import *
from spire.doc.common import *

# Create a Document instance
document = Document()
# Load a Word document containing headings
document.LoadFromFile("AddHeadings.docx")

# Create a list to store the extracted heading 1
headings = []

# Iterate through all sections in the document
for i in range(document.Sections.Count):
section = document.Sections[i]
# Iterate through all paragraphs in each section
for j in range(section.Paragraphs.Count):
paragraph = section.Paragraphs[j]
# Check if the style name of the paragraph is "Heading1"
if paragraph.StyleName is not None and paragraph.StyleName == "Heading1":
# Get the text of the paragraph and append it to the list
headings.append(paragraph.Text)

# Save the content of the list to a text file
with open("Heading1.txt", "w") as file:
for heading in headings:
file.write(heading + "\n")

document.Close()

Conclusion

This article demonstrated how to add various levels of headings to Word documents and how to extract headings from Word documents using Python. We hope you find it helpful.

Related Topics

--

--

Alice Yang

Skilled senior software developers with five years of experience in all phases of software development life cycle using .NET, Java and C++ languages.