Creating Complex Legal Documents with Google Services

Vladimir Katz
Policygenius
Published in
5 min readMar 16, 2021

Recently Policygenius introduced a new Will & Trust product line. You can download the application from App Store or Google Play. Using the app you can create your Will (and/or Trust) within minutes by answering a few simple questions. Once completed you will receive a copy of your will via email and a hard copy will be sent to your physical address.

Behind the scenes the application is served by about 10 microservices and a few databases. One of the most interesting services in this collection is what we called text-merge service. This is the service that creates word and pdf documents out of data presented in JSON format.

Due to different legal requirements in each state, a will for a person in New York would be much different from a Texas will even though the underlying data structure is the same. There are also different clauses that need to be inserted or removed depending on the person’s dependents, family and financial circumstances and so on. The formatting of the document also depends on the data (for example some sentences need to be emphasized using bold fonts, or ALL CAPS).

We calculated that there are thousands of permutations of the document structure depending on the underlying data. In this article I wanted to present the challenges and solutions of generating legal documents which require a lot of “presentation” logic.

History

Our first step was to look for third party vendors that solved our use case. After trying several options, we weren’t able to find one that could be integrated in a reliable or observable way. We clearly needed a better solution.

The second proposal revolved around leveraging Google Docs API for document generation. Google Docs offered text merge functionality. Given a collection of key-value pairs it would search a template document and replace {{key}} with value.

However, it did not look like there was a possibility of inserting any logic in the template document itself — for example, “given a certain value, highlight the text, otherwise just insert”. This type of logic would have to be handled in code while generating a document with the help of the docs SDK. Initially we thought it would not be a very complicated task, but then the documents became more and more complex and the whole solution started to fall apart. It became apparent that manipulating the template logic in code would prove to be unmanageable in the future.

The third proposal was creating an HTML-based document first and then somehow converting it to Word and PDF formats (due to legal requirements the will and trust documents must be presented in an editable format to the consumer, thus the need for Word documents).

HTML templating is as old as the internet itself and there are many libraries in all of the languages that help you with this task. We use Golang as our language of choice for creating fast and reliable services. GO provides an excellent “html/template” package as part of its standard library. Like most libraries of this sort it works by merging a template with JSON data.

How to convert HTML to DOCX (Word) was not initially obvious. There is no good library that we could find right away. As we kept digging, we found a poorly documented feature of Google Docs API which allows uploading documents in certain original formats, including HTML and saving them as Google’s internal format which I assume is actually Office Open XML standard. We can then download the Google Doc in many formats, including PDF and DOCX. BINGO!

Implementation Details

This is how we implemented the third approach and how we solved some of the issues that came along with it. In a nutshell, we created a GO service with one endpoint which accepts data in JSON format and a template(s) to be used with this data. We then generate the documents and upload them to google bucket. If all goes smooth, we put a message on a topic to let other services (such as email service) know that the documents can be processed further.

Merge template and data to create HTML code represented as byte array:

Create Google Document from HTML content using Google API:

Add page numbers to document:

This one proved to be tricky, because at the time of writing this article Google Doc SDK does not support adding page numbers to existing documents programmatically. We first looked into a pure GO library for creating documents called Unioffice. The API was straightforward but it was not free. Before committing to this library we wanted to see if there is a free solution.

As the name suggests, OpenXML format uses XML to represent the content and styling of a document. It is then zipped and the DOCX extension is added to the compressed file. What if we compared the XML of a document without page numbers to the same document with the numbers and see if we can programmatically insert the missing info into the uncompressed file and then zip it back.

Turns out this is not such a complicated thing. This project we found on github gave us ideas on how to accomplish it.

  1. Download Google Doc in MS Word format (which is OpenXML) using Google Drive API:

2. Unzip the file using Golang standard zip library:

3. Insert missing page numbers info:

The footer xml files referred in the above code are taking from an empty word documents with page numbers. It can be reused for all documents unchanged.

4. Zip the uncompressed updated files back to DOCX using the same Golang zip library.

5. Update the document in Google Docs using the same Google Drive APIs:

  • Download the Google Doc as PDF and DOCX using google drive APIs:

See func Export above and use the following MIME types:
“application/pdf”
“application/vnd.openxmlformats-officedocument.wordprocessingml.document”

  • Store documents in a google storage bucket using Google storage APIs
  • Broadcast the completion of document generation. We do this by posting a message to a topic from which subscribers get notified and take appropriate actions.

That is it. All of the above operations are happening in memory without ever writing to the disk improving reliability.

Conclusion

As you can see, there are a lot of requests going out to Google servers and thus document creation takes about 30 seconds on good days, but we have seen it sometimes go up to 2 minutes. Plus, network connections are not the most reliable means of communication.

That is why we implemented a queue in front of this service in case there is an exception while generating a document. We don’t really care how fast these documents are generated so this delay is OK with us. We also noticed during our stress testing that Google imposes a rate limiting when communicating with their APIs, but in reality we are not even near that limit and so far the service has been very reliable.

--

--