Convert Word Documents to Excel with Python: A Formatting-Preserving Approach

Alexander Stock
4 min readJun 17, 2024

--

Convert Word to Excel.

In many work and academic settings, it is common to need to transfer data from a Word document into an Excel spreadsheet. This could be for performing calculations, creating charts and visualizations, or organizing the data in a more structured format.

This post provides a Python-based approach for converting Word documents to Excel spreadsheets while preserving the original layout and formatting.

Install Required Libraries

This solution relies on two libraries, namely Spire.Doc for Python and Spire.XLS for Python. The former is used to read and extract content from a Word document, and the latter is used to create an Excel document and write data into specific cells.

You can install them from PyPI using the following pip commands.

pip install spire.doc
pip install spire.xls

Method to Export Data from Word to Excel

To enhance the readability and maintainability of this code example, I have implemented the following four custom methods, each of which performs a specific function:

  • ExportTableInExcel() — Export data from a Word table to specified Excel cells.
  • CopyContentInTable() — Copy content from a table cell in Word to an Excel cell.
  • CopyTextAndStyle() — Copy text with formatting from a Word paragraph to an Excel cell.
  • Html_to_argb() — Convert an Html color to an Argb color.

The key steps to convert Word to Excel are as follows:

  • Load the Word Document: Begin by loading the Word document into a Document object.
  • Create an Excel Workbook: Create a new Workbook object and add a worksheet to it.
  • Traverse the Word Document: Iterate through the sections in the Word document, then the document objects within each section, and determine whether each document object is a paragraph or a table.
  • Handle Paragraphs: If the document object is a paragraph, use the CopyTextAndStyle() method to write the paragraph contents to a specified cell in the Excel worksheet, preserving the formatting.
  • Handle Tables: If the document object is a table, use the ExportTableInExcel() method to export the table data from the Word document directly into the Excel worksheet.
  • Save the Excel Workbook: Finally, save the updated Excel workbook to a file.

Convert Word to Excel in Python

The following code demonstrates how to leverage the Spire.Doc and Spire.XLS libraries to automate the process of converting a Word document to an Excel spreadsheet while preserving the original text formatting.

from spire.doc import *
from spire.doc.common import *
from spire.xls import*
from spire.xls.common import*

# Copy text with formatting from a paragraph in Word to a cell in Excel
def CopyTextAndStyle(worksheet: Worksheet, cell: CellRange, paragraph: Paragraph):
richText = cell.RichText
richText.Text = paragraph.Text
startIndex = 0
for i in range(paragraph.ChildObjects.Count):
documentObject = paragraph.ChildObjects[i]
if(isinstance(documentObject, TextRange)):
textRange = TextRange(documentObject)
fontName = textRange.CharacterFormat.FontName
isBold = textRange.CharacterFormat.Bold
textColor = textRange.CharacterFormat.TextColor
argb_color = Html_to_argb(textColor.Name)
fontSize = textRange.CharacterFormat.FontSize
textRange_text = textRange.Text
strLength = len(textRange_text)
font = worksheet.Workbook.CreateFont()
font.Color = Color.FromArgb(argb_color[0], argb_color[1], argb_color[2], argb_color[3])
font.IsBold = isBold
font.Size = fontSize
font.FontName = fontName
endIndex = startIndex + strLength
richText.SetFont(startIndex, endIndex, font)
startIndex += strLength
if(isinstance(documentObject, DocPicture)):
picture = DocPicture(documentObject)
cell.Worksheet.Pictures.Add(cell.Row, cell.Column, picture)
cell.Worksheet.SetRowHeightInPixels(cell.Row, 1, picture.Height)
if(paragraph.Format.HorizontalAlignment == HorizontalAlignment.Left):
cell.Style.HorizontalAlignment = HorizontalAlignment.Left
elif(paragraph.Format.HorizontalAlignment == HorizontalAlignment.Center):
cell.Style.HorizontalAlignment = HorizontalAlignType.Center
else:
cell.Style.HorizontalAlignment = HorizontalAlignType.Right

# Copy content from a table cell in Word to an Excel cell
def CopyContentInTable(tableCell: TableCell, cell: CellRange, worksheet: Worksheet):
newParagraph = Paragraph(tableCell.Document)
for i in range(tableCell.ChildObjects.Count):
documentObject = tableCell.ChildObjects[i]
if(isinstance(documentObject, Paragraph)):
paragraph = Paragraph(documentObject)
for j in range(paragraph.ChildObjects.Count):
childObject = paragraph.ChildObjects[j]
newParagraph.ChildObjects.Add(childObject.Clone())
if(i < tableCell.ChildObjects.Count - 1):
newParagraph.AppendText("\n")
CopyTextAndStyle(worksheet, cell, newParagraph)

# Export data with formatting from a Word table to specified Excel cells
def ExportTableInExcel(worksheet: Worksheet, row: int, table: Table):
cell = None
column = None
for i in range(table.Rows.Count):
column = 1
tableRow = table.Rows[i]
for j in range(tableRow.Cells.Count):
cell = worksheet.Range[row, column]
cell.BorderAround(LineStyleType.Thin, Color.get_Black())
tableCell = tableRow.Cells[j]
CopyContentInTable(tableCell, cell, worksheet)
column = column + 1
row = row + 1
return row

# Convert Html color to Argb color
def Html_to_argb(html_color):

# Remove the '#' symbol
html_color = html_color.lstrip('#')

# Convert the hexadecimal color code to integer
a = int(html_color[0:2], 16)
r = int(html_color[2:4], 16)
g = int(html_color[4:6], 16)
b = int(html_color[6:8], 16)

return a, r, g, b

# Create a Document instance
document = Document()

# Load a Word document
document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Invoice.docx")

# Create a Workbook object
workbook = Workbook()

# Remove the default worksheets
workbook.Worksheets.Clear()

# Create a worksheet named "WordToExcel"
worksheet = workbook.CreateEmptySheet("WordToExcel")

# Declare two variables
row = 1
column = 1

# Loop through the sections in the Word document
for i in range(document.Sections.Count):
section = document.Sections[i]

for j in range(section.Body.ChildObjects.Count):
documentObject = section.Body.ChildObjects[j]
if(isinstance(documentObject, Paragraph)):
cell = worksheet.Range[row, column]
paragraph = Paragraph(documentObject)
CopyTextAndStyle(worksheet, cell, paragraph)
row = row + 1
if(isinstance(documentObject, Table)):
table = Table(documentObject)
currentRow = ExportTableInExcel(worksheet, row, table)
row = currentRow

# Wrap text in cells
worksheet.AllocatedRange.IsWrapText = True

# Auto fit column width
worksheet.AllocatedRange.AutoFitColumns()

# Save the workbook to an Excel file
workbook.SaveToFile("output/WordToExcel.xlsx", ExcelVersion.Version2016)

# Dispose resources
document.Dispose()
workbook.Dispose()
Convert Word to Excel with text formatting preserved

Conclusion

In this article, we’ve demonstrated a Python-based approach for seamlessly converting Microsoft Word documents into Excel spreadsheets. The code snippet presented can efficiently transfer both paragraphs of text and tabular data into corresponding Excel cells, preserving the formatting and structure of the original Word document.

See Also

--

--

Alexander Stock

I'm Alexander Stock, a software development consultant and blogger with 10+ years' experience. Specializing in office document tools and knowledge introduction.