How I Automated Microsoft Word Templating in Mendix Using The Apache POI API

The world loves Microsoft Office. As a result, most Mendix developers will end up working on a project in which Word documents will need to be generated from your app.

Ivo Sturm
Mendix Community
Published in
7 min readNov 7, 2022

--

How I Automated Microsoft Word Templating in Mendix Using The Apache POI API (Banner Image)
How I Automated Microsoft Word Templating in Mendix Using The Apache POI API

Mendix Word document generation

When it comes to Word documents, Mendix has its own built-in document template design system. Even though it’s a bit clunky to use, you can create great documents from your Mendix apps data. The main concept is that Mendix will create the Word document with all its data.

Word Templates

Over the past eight years, I have seen several use-cases where there is a need to maintain the template in Word itself and let Mendix add/change the Word document. This is specifically interesting if many types of documents need to be generated for different clients within a Mendix app; each client needs its own details in the generated Word document, but the template is mostly generic in structure for all clients. How great would it be if we could maintain the basics of the template in Word and let Mendix automatically populate the Word template to make it client specific? So:

Dynamically change words in the template and add rich text, images and lists
Populate chart template with data from your Mendix app

Apache POI

The go-to Java library for manipulating MS office documents is the Apache POI library. This Java library has out-of-the-box APIs to create and manipulate the most common Office documents such as Excel and Word. Both Mendix modules related to exporting to and importing from Excel heavily depend on the usage of this library.

I started to read up on the possibilities of Apache POI and it seemed that accessing word templates wouldn’t be too hard. Indeed that wasn’t the hard part. In some five lines of code, a Word document can be opened and its content manipulated.

I was in need of a couple of additions Mendix document generation was missing:

1. Word replacements: replacing a specific word with bookmark value to another (client-specific) value coming from the Mendix app

2. Placing content (table, (un)ordered lists, image) in an existing word document

3. Rich Text support

4. Manipulating Charts

Below I will cover all four challenges. I will conclude with a fallback scenario on what to do if the Apache POI library is missing some direct function to get what you need.

Word replacements

What seemed like a very standard feature, turned out to be fairly complex. This is one of the main reasons why several paid libraries out there have created their own IP on how to access words in a Word document.

The challenge comes from the fact that any word in a Word document will be divided into so-called ‘runs’. One can’t know upfront how many runs a word will consist of.

For sure adding a different font or font size will increase the run size. The more complex (both in terms of length and formatting) the word is, the more runs will be created for the word. So, if you want to programmatically search the word document, you have to check all possible runs a word can be part of, combine the texts in these runs and compare with the value the logic is searching for. This is quite challenging as the number of permutations on segments/runs of a word to look into, can grow quickly.

Happily, there is someone who has a solution for a word consisting of up to threeruns. I implemented this library by adding the WordReplacer helper class, which has several functions for replacing words in plain text as well as in tables:

As a side note; since only a maximum of three runs are supported for word replacements, stick to simple (no formatting) and short bookmark values without spaces/special characters.

Placing content

If you want to add some (rich) text, table, (un)ordered list, or chart into an existing document, you need a mechanism to manage where the content will be added. Gladly it turns out that tables are very well supported within the Apache POI library and exact positions can be retrieved by using tables as placeholders. The idea is to associate this Mendix content to some Word table cell content name. The code will then open the Word document and search for a 1-by-1 table (so only one cell) with exactly that name as content. It will add the dynamic content there and remove the table altogether.

Code Snippet 1: Moving to the position of the dummy table with the XMLCursor and adding content there with the help of this XMLCursor.

Rich Text

How nice would it be if text formatted as rich text in your Mendix app, could be visualized in the same manner in the Word document? Rich text on a webpage is typically HTML-formatted text. Based on the HTML tags it can be deduced what font formatting to add. I have found a very consistent library, called Jsoup, which can properly handle rich text. Jsoup introduces the concept of iterating a stacked HTML structure which is exactly what rich text is.

Just enter your text as rich text with the CK Editor for Mendix. The module will use the Jsoup library to navigate and iterate the HTML structure and the Apache POI library to enforce specific formatting based on the type of HTML node being iterated over:

Code Snippet 2: Part of the RichTextParser class I created for handling rich text

Manipulating Charts

A feature lacking in Mendix document generation and a real “nice to have feature” is support for charts in a Word document. The idea is that the default chart template is added to the Word template already. Based on Mendix data the chart will be built with possibly multiple different series. The Apache POI library can access the Excel add-in data table that a Word chart uses as a data source:

The underlying Excel worksheet of a chart in Word

And directly manipulate this data set and trigger a refresh of the chart based on this data.

Code Snippet 3: Fill the underlying Excel chart data with series/category data coming from your Mendix app.

If there were more categories in the template than in Mendix, those obsolete series would be removed.

Currently, only (stacked) bar and line charts are supported

What if Apache POI is lacking?

The Apache POI library has a feature-rich set of functions for manipulating MS Office products with Java. It can occur, however, that the Apache POI doesn’t support a specific feature yet with a directly available java method for instance changing some specific formatting in a Word graph. Always try to cover it within the template first, but if a real extension to the java code is needed, it is important to understand that a Word file, just like an Excel file, is nothing more than a zipped folder of XML files. By unzipping and checking the XML it can be quite easy to find the exact XML tag which is needed to add a specific feature. Just add the feature in Word, save the file and check in the XML it consists of what XML tag is added/changed. See below for the unzipped Word template file which comes with the project:

A Word document is actually a zipped XML archive

By locating the specific XML tag there is a fallback scenario in which via the Apache POI library one can access these specific tags and manipulate those. See for instance below a fix on how to set the numbering for item lists by accessing the lower-level XML tags (CTP, PPR, NumPr):

Code Snippet 4: Accessing lower-level XML tags of Word document

Demo Project

The fully working demo project which demonstrates all the above-mentioned features is available within the Mendix Marketplace. Just download it, run the project and upload the test template that comes with the project to quickly see its capabilities. The test template can be found in the resources folder.

https://marketplace.mendix.com/link/component/111539/Valcon/Word-Template-Demo-Project

Known bugs

Currently, there are no known bugs. It is good to understand that this module uses the same Java library, Apache POI, which the Mendix modules ‘Excel Importer’ and ‘Excel Exporter’ use. Be sure to keep those libraries aligned, so all modules use the same version of the Apache POI library, currently 5.2.2.

Demo app

Read more

From the Publisher -

If you enjoyed this article you can find more like it on our Medium page. For great videos and live sessions, you can go to MxLive or our community Youtube page.

For the makers looking to get started, you can sign up for a free account, and get instant access to learning with our Academy.

Interested in getting more involved with our community? Join us in our Slack community channel.

--

--