How we manage documentation at Funding Circle for our Data Platform

Nikolajs Skrjabins
Funding Circle
Published in
7 min readApr 4, 2023

Documentation (docs) brings enormous benefits, yet only 4% of companies always document their processes. I would like to share our experience at Funding Circle of managing documentation for our Data Platform.

Data Platform is a collection of software services that provides the capabilities required to process and deliver data to advance business insights and decisions.

Table of Contents

Background

Historically, when developing the Data Platform at Funding Circle we didn’t have a culture of writing docs. For too long, we prioritised faster feature releases over docs. Docs were rarely written and those that were written were of inconsistent quality.

This approach started to cause several issues:

  • New joiners spent a lot of time figuring out how to use the Data Platform.
  • Everyday we were seeing more and more questions in our slack help desk channel.
  • Number of data related incidents and time to resolve them increased.

With the growing number of tools and features in our Data Platform, these issues were only growing.

The problem was aggravated by the fact that each team had a very different approach to managing docs:

  1. Docs in Confluence.
  2. Docs as a collection of Markdown files in a GitHub repository. Sometimes automatically publishing these to Confluence (as a part of CI/CD pipeline e.g. using sphinx-confluence plugin).
  3. Docs in a GitHub wiki.
  4. Docs in a dedicated GitHub Pages website.

Multiple approaches to managing docs confused engineers who had to create the docs and made it hard for users to find the docs they need.

In some sense, we are grateful to all of these problems, because everyone in the organisation unanimously agreed that we need to fix our approach to docs. This ensured that everyone was passionate about fixing the problem and there wasn’t a lot of resistance to changing our ways.

Documentation Principles

Our first step was to create a set of universal principles that would apply to all docs we write. We created a working group with representatives from all Data related teams and agreed to the following principles.

Write for the reader

Our number one complaint about our docs was that the content was too technical or irrelevant. We decided to try the diataxis documentation framework to fix this. In the diataxis framework, docs are organised in 4 types of content (tutorials, how-to guides, explanations and references). Each type of content has a specific purpose. For example, tutorials are written in the form of a lesson because they are aimed at newcomers.

Diataxis framework helps users to find the content quicker and have the right expectations from the content. The framework also helps authors of documentation by specifying how each type of content should be written.

Restructuring our docs according to this framework immediately uncovered some gaps. For example, we have a lot of how-to guides but not that many tutorials. This is great for users who have previous experience with our tools, but not so great for users who are only learning the tool.

Can you figure out which of the 4 documentation types in the diataxis framework, this blog post falls in? Leave you answer in the comments section!

Write less

More words is not always better. Instead of investing a lot of time into writing a giant how-to guide, it is better to simplify the product from a UX perspective.

This is especially important because users are put off by the length of documentation.

When we started the overhaul of our documentation, we could clearly see which tools/features in our Data Platform could be streamlined by looking at our largest tutorials/how-to guides.

Docs as Code

Documentation as Code (Docs as Code) refers to a philosophy that you should be writing documentation with the same tools as code. This includes version control with git, code reviews and running automated tests.

Automated tests catch common errors like typos and broken links, while code reviews of docs help to increase the overall quality of the docs.

Gitlab and Baseten are good references of Docs as Code implementations by large and small organisations.

The biggest challenge with Docs as Code is ensuring that docs Pull Requests (PRs) are reviewed to the same standard as the traditional feature PRs. It is often tempting to rubber stamp a doc’s PR and move on to more exciting things. There is no silver bullet to make people change their mindset about this, especially after it became a habit.

At Funding Circle, we found that as with anything it takes time and practice to get this approach ingrained. With time, we saw engineers reviewing docs more thoroughly.

Central docs website

Our previous experience of fragmented docs led us to adopt a centralised approach — one docs website in one GitHub repository for all Data Platform related docs. So far this has been working quite well for us. There was no need to decentralise prematurely.

Technical Implementation

Here is how we implemented our docs website from a technical perspective.

Static website generator

Several static website generators are used in the industry. The 3 most popular are Jekyll, Hugo and mkdocs. All of them can generate a very large static website in a few seconds. The choice is often determined by the preference of the team for Ruby (Jekyll), Go (Hugo) or Python (mkdocs) ecosystem. Available themes that work with these generators also factor in the decision.

We chose Jekyll (with Just the Docs theme), because we already had experience in the organisation of working with it. We also saw some successful implementations of docs websites using Jekyll.

Here is how our docs website looks:

Google Analytics

Just the Docs theme makes it effortless to set up Google Analytics for your site. You can get top level, as well as page level metrics for your site.

User metrics over time
Page level metrics

Hosting

Because our initial target user base was engineers and analysts, we used GitHub Pages to host our website. But then, other types of users were keen to read our docs (Product Managers, Audit, Compliance, etc.). GitHub Pages require users to have a GitHub account to access private sites.

So we moved away from GitHub pages to AWS Cloudfront, making the internal website available to all Funding Circle employees (without requiring the GitHub account). We choose Cloudfront, for 2 reasons:

  1. It supports HTTPS (unlike hosting the static website in S3).
  2. We use AWS for all of our cloud needs already, and some of our engineers were familiar with Cloudfront.

Development environment

Users generally spin up the docs website locally, to see how their changes look like. Test suite can also be run locally.

But there is also an option to deploy the changes to a UAT version of the website. This helps the PR reviewer, as there is no need to pull the changes locally to see how they look at the website. The deployment is part of the CI/CD pipeline, which takes 2–3 minutes.

Test suites

We are using 2 tools to test our docs website — Htmlproofer and Vale.

Htmlproofer checks that referenced links are valid. It checks internal links (generated by Jekyll) and external links (everything else). This ensures that your references are always up to date.

Vale is a syntax aware linter, which can be configured to check writing style and grammatical structure. Different organisations (GitLab’s Vale rules) release their Vale rules to the public, encouraging reuse and alignment.

We started small, setting only a few rules to generate an error (and fail the CI/CD step). All other Vale rules in our setup generate a warning/suggestion. These do not fail the CI/CD step and are up to the PR author to decide whether it is worth implementing or not.

We choose this setup, because a lot of Vale rules are subjective and it can be hard to agree on a wide range of rules that should output an error. For example, should common Latin abbreviations (e.g., i.e., etc.) generate an error? At the end of the day, it comes to a matter of preferences (just like the brace placement).

Conclusion

For too long, we prioritised faster feature releases over docs. Furthermore, our documentation was fragmented. This caused multiple issues.

Our Documentation strategy — set of universal principles e.g. Docs as Code is aimed at fixing these issues.

We implemented a central docs website using Jekyll and Just the Docs theme, which is hosted in AWS Cloudfront. Htmlproofer and Vale are used to run a suite of automated tests.

It is hard to prove that improved documentation led to reduction in data incidents, questions in the slack help desk channel or time to get comfortable with the Data Platform for new joiners. These metrics have to be considered in the wider context e.g. bug fixes, overall improvement of Data Platform, level of experience of users, etc. One thing is certain, Data Platform users are certainly loving our docs now.

Thank you for reading! Make sure to follow us on Medium and our new engineering page on Linkedin.

--

--

Funding Circle
Funding Circle

Published in Funding Circle

We deliver an amazing experience for small businesses powered by machine learning and technology. Over the past ten years, we have built a machine learning and technology platform that is revolutionising SME lending.

Nikolajs Skrjabins
Nikolajs Skrjabins

Written by Nikolajs Skrjabins

I write about data without the clickbait titles

No responses yet