How to Support Platforms That Clash With the Docs as Code Philosophy?

Kristijan Puljek
ReversingLabs Engineering
7 min readDec 12, 2022
Source: freephotocc @ pixabay.com

When faced with thousands of pages of documentation that not only has to be regularly updated, but also delivered in multiple significantly different output formats on a monthly basis, your first instinct might be to start updating your CV.

While daunting, this task is not impossible, and does not require expensive tools and complicated solutions. If anything, the opposite is true. By storing content in simple plain text markup format and employing the Docs as Code philosophy, the documentation process can be streamlined and allow writers to focus on the writing itself, rather than on the unnecessary overhead.

However, this overhead can painfully resurface if there’s an unexpected requirement to support a platform that directly clashes with the Docs as Code approach.

Proprietary, closed platforms can reintroduce the necessity of performing various tasks manually and maintaining multiple copies of the documentation, always leading to inconsistencies and increasing the chance for human error.

There are a number of different solutions for such situations. You can try to talk your way out of the task by arguing it’s impossible, decide to update both platforms as separate entities, abandon the established workflows and make this new platform the new “master”, or attempt to force the new platform to play by your rules using automation and scripting. Which is exactly what we did.

But first, some background.

He Who Controls the Docs Controls the Universe

If language is a living thing, documentation is a wild animal. It loves to multiply, fork, spawn new instances, spread over email in suspicious (and editable) formats, and create confusion.

To keep the documentation at bay, we have completely abandoned the idea of storing it in any editable format except for reStructuredText and output formats used in deployment are all strictly read-only. The reStructuredText files are kept in our Git repositories, version controlled and safe from happy little accidents. Content is king; it needs to be stored safely in a single, controlled environment.

Locking and taking ownership of the documentation ensures that all changes go through the technical writing team before they get merged and published and that everyone is using the same version of the documentation with all of the latest updates.

This approach mitigated situations where multiple people would have personal copies of documentation, containing their own tweaks and improvements that should have been added to the master copy. We would sometimes even get bug reports for outdated clones of documents we weren’t even aware existed.

Our documentation is deployed in a couple of different formats. We use Sphinx to generate the HTML version of the documentation, and Weasyprint takes care of our PDFs. These are all published to previously agreed upon and well-known locations, so that everyone knows where to find the latest docs.

Both Sphinx and Weasyprint are amazingly powerful tools, and they deserve a shout-out. They are convenient to use and easy to debug when something goes wrong. Their openness also allowed us to create a coherent theme that is used across the board to make the documentation visually consistent and recognizable.

This system is not exactly new and it works flawlessly, as proven by hundreds, if not thousands, of other technical writing teams around the world.

Hard Tasks Need Hard Ways

When we originally received the request to publish the entirety of our documentation to Zendesk to make it more integrated with our support ticketing system, it looked like there’s no way around going back to the stone age, AKA copy-pasting the content by hand, duplicating it, and having it stored in two places.

And that’s what we did, for that one release cycle. It took us weeks of planning, creating categories and fixing all of the formatting that got lost in the process. Human error was also a significant factor because, as it turns out, copying 2000+ pages of documentation by hand can cause some problems and omissions.

Spending weeks to manually publish documentation whenever a new product release goes out was not sustainable, and directly clashed with our established procedures.

If a workflow is only as strong as its weakest link, this thing was a catastrophe.

We had to find a new approach that would allow us to maintain the existing setup, but also somehow update this new platform as well.

There was always the option of explaining why this task seriously overcomplicates things, but sometimes you just have to justify the “technical” in “technical writing”, and our curiosity got the best of us.

The Power to Automate a Thing is the Absolute Control Over It

Our bottom line and guiding principle when tackling this issue was that it’s unacceptable to have a duplicate version of the documentation that has to be manually updated every time we change something in the reStructuredText master.

We wanted to preserve Git as the single source of truth and to automate the publishing process as much as possible.

Luckily, Zendesk has APIs and we have all of our docs in plain text that can be hacked and slashed however we want. The basic idea was to use Python to push the docs from Git repositories straight to Zendesk.

Having little background in development, the first iterations of our scripts were glorified proofs of concept that only performed minor steps of the publishing process, and things progressed slowly.

Some of the smaller hurdles we had to solve were how to transform the output from Sphinx into something that will pass as a valid API payload, how to convince Sphinx to build every documentation chapter as a separate self-contained project, and how to extract titles from reStructuredText files into Zendesk article titles.

Additionally, there were bigger problems that didn’t have obvious solutions, like including inline images in articles, and links referencing other articles that would completely break once published to Zendesk.

Removing the links was not an option, as it would significantly reduce the experience of navigating through the documentation. Published articles also looked out of place and broken once published on Zendesk, as the custom Sphinx theme we used to build HTML documentation wasn’t available on Zendesk.

Issue 1: Inline Images

Without explaining why screenshots are nice to have, let’s just skip to the part where we decided that we absolutely have to preserve them.

The solution to this problem was to build the HTML documentation using Sphinx as we always do, parse the entire output and base64 encode the images, making them part of the HTML file and, consequently, the API payload.

Issue 2: Links

This second issue almost made us give up on the entire automation idea. Once Sphinx outputs the HTML documentation, it links to other articles by their name, followed by an anchor, for example search.html#keywords. When published to Zendesk, this relative link now points to nowhere and results in a 404.

To work around this issue, we have taken advantage of one important characteristic of every Sphinx project: all anchors within a Sphinx project have to be unique. So, if there can only be one #keywords anchor in our repository, it can be used as a unique ID to identify the file/article that contains it.

After the documentation is published, our script queries Zendesk for the article list in that category and creates a dictionary of all article URLs and their titles. It also creates a second dictionary of all local files, their titles, and the links contained within.

Once both dictionaries are ready, the script cross-references them by title and anchor, replaces the relative location before the anchor with the link to the actual Zendesk article, adds those links to the local copy of the files, and republishes them to Zendesk.

This means that the script has to execute the publishing process twice, but preserving the reference links inside the documentation is well worth it.

Issue 3: Theming

Our documentation uses a custom theme, including CSS for API endpoints, tables, code snippets and other elements one might find inside technical documentation. Publishing those builds to Zendesk resulted in a complete mess, as Sphinx created CSS classes and IDs that weren’t covered by the Zendesk theme.

This solution was simple, if tiring: we have ported our entire theme stylesheet to Zendesk, and expanded the selectors to target everything that comes out of Sphinx, as well as the existing Zendesk elements. The resulting theme makes articles visually consistent regardless if the article was manually created using the Zendesk GUI, or published using our script.

After some testing and experimenting, we have produced a rather elegant (as long as you don’t look at the code!) solution that takes the HTML output from Sphinx, transforms it, and publishes all of the content to a predetermined category, updates articles with matching titles, or creates new ones when necessary. During its second pass, it fixes and updates the links.

There were other issues, major and minor, that came up over the years. Some were our own doing, others were caused by Zendesk changing things like maximum article sizes, making us regroup and reorganize how the projects are built and published. This is always a risk when relying on solutions you don’t directly control.

All in all, these scripts are not without a fault and will always be a work in progress, but they still turned weeks of planning and keyboard and mouse torture into a 5-minute task where we get to sip coffee and look at the nicely formatted terminal output.

Conclusion

After successfully tackling this task, we have only reaffirmed our belief in Docs as Code, and we don’t even humor similar requests (more about this in the upcoming sequels!) by questioning our established workflows and content structures, or by considering doing things by hand.

Content might be king, but owning the content structure and being able to programmatically transform it at will is just as important as being able to change the content itself, and is probably the best chance to integrate and automate a closed platform with an already established one.

If there’s a single thing to take away from this, it’s that using simple formats and FLOSS tools as the basis for storing technical documentation is always a benefit when compared to proprietary solutions that take control away from the writers to make things “easy to use”.

--

--