Confluence is ill-suited for data-based reporting

A critique of Confluence and other general knowledge hubs for the purpose of technical reporting & documentation

Kyle
The Kyso Blog
Published in
7 min readAug 18, 2020

--

The Role of Knowledge Hubs

Knowledge hubs like Confluence, Notion, Guru and BoostHQ facilitate general knowledge management and sharing within companies by providing for the collection, organisation and retrieval of knowledge all in one place. The goal of these platforms is to drive project development & collaboration around this knowledge and impact decision-making across a variety of departments and functions.

These tools:

  1. Are generally used around the entire business.
  2. Allow contributors to connect with their colleagues.
  3. Allow readers to learn new information that can be applied to their respective roles.

For businesses of all sizes, the value proposition of these tools is that they improve the productivity of its workforce. They do this by capturing information and organising it into a single source of truth. Information becomes something much more powerful when it’s iterative, trackable, and effortlessly accessible. In this way, information becomes knowledge, which is now actionable.

However, while these types of solutions are great for managing, sharing and discussing general knowledge about the business, they are not ideal for technical, data-based reporting.

Disclaimer: I am one of the co-founders at Kyso, which is our solution to this issue — a central hub for technical reporting. Naturally, we would love for you & your company to use our platform, but in this article I am going to make the general argument as to why an organisation needs some system specifically designed for data-based content — regardless of what that system is.

Technical Reporting

Within organisations of all different kinds and sizes, the technical side of the company — the data team, comprised of all your data engineers, data scientists and business analysts — use tools, like data-science notebooks and Github, that are only accessible to them alone.

This means that once they have conducted their analysis & drawn business conclusions from the data, they tend to separately share or publish their latest findings. What does this workflow look like? One of a few different scenarios tends to happen:

  1. The data scientist or engineer copy and pastes snippets of their report to share on slack or to send around by email. These insights a) get siloed within the various sub-groups within which they are shared and b) their results are generally lost to the email archive or slack history and are hard to find later down the road when someone new joins the team or when a colleague wants to refer to a particular analysis or request an update.
  2. They use one of the many conversion tools available to them to render the report into a more readable format — like a PDF or HTML — again, to share around by email, slack or some other communication medium. The same issues from the first option apply here.
  3. They initiate a Markdown document on Confluence, Notion or some other knowledge hub and start writing up the report from scratch, copy and a pasting graphs in as appropriate. The problem here is that this is a manual conversion that takes time — the data scientist is effectively duplicating their own content.

The Result

Regardless of which option the data scientist or engineer goes with, the end result will be the same — the knowledge becomes completely desynchronised. There is a disconnect between the tools used by different sides of the business.

Also, what happens if the report in question is one that is conducted weekly or monthly? Future versions of the same analysis will always have to be manually updated. Otherwise, if someone doesn’t keep the reports updated at the team level, everyone loses trust in the knowledge hub, rendering it useless with regards to improving business decision-making.

So what is happening here is that data scientists are analysing company data day-to-day but then struggle to make these generated insights widely available inside the business in a simple way.

While hubs like Confluence & Notion do help spread knowledge across the company, they are not compatible with the data-science tools & so are not ideal for technical reporting. What is needed something that makes this process — computation to publication — automatic, that bridges the gap between technical and non-technical members of a team or business.

Workarounds

We know this is a widely-recognised problem — particularly on the technical side of the business — and that people are already looking for ad-hoc ways to solve it. How do we know this? Because there are many workarounds in the form of open-source tools & paid integrations that attempt to incorporate the tools used by data scientists into existing knowledge hubs. Just a few examples are listed below.

  • nbconflux is a tool to convert Jupyter Notebooks into Atlassian Confluence pages using nbconvert (another open-source tool that converts notebooks to various other formats).
  • There’s also a Confluence macro to render Jupyter/IPython notebooks inside Confluence pages.

The problem with both these tools is that they are add-ons to the existing workflow of data scientists, additional steps needed to be carried out before people can access the results.

You will also run into the issue of knowledge desynchronisation, mentioned in the first section, whereby the finished “reports” shared on the general knowledge hub are completely disconnected from where the original reports — the notebooks themselves — are hosted.

  • An organisation could also host nbviewer on their own servers, which renders Jupyter notebooks into static HTML. However, this is easier said than done & will entail more advanced configuration (and a lot of maintenance) than you’re going to find in the documentation.

There are many other ways data scientists attempt to disseminate their results throughout the business. What this shows us is that people are actively searching for a solution, a better way to publish, share, and manage technical, data-based reports. However, general knowledge hubs like Confluence just don’t cut it.

A Better Way

What is needed is a knowledge hub specifically designed for the discovery of and collaboration on technical content. A solution that appreciates the complexity of organisational data science and data-based reporting, and that removes the bottlenecks that exist in the space between the technical and non-technical side of the company.

This knowledge hub must:

  • Be compatible with all the different tools used by data scientists. This includes both the tools used for running the analyses (e.g. Jupyter notebooks) and those used for maintaining the history of results (e.g. Github).
  • Be accessible to both technical and non-technical members of an organisation.
  • Facilitate discovery of full reports that previously would have been shared around by email, slack or upon request.
  • Drive communication between these two sides of the business. So these tools must not only be rendered as reports, but readers must be able to collaborate on the content & discuss results.

Such a solution may be what your business needs if:

  • Your data scientists use Jupyter notebooks (or any other data-science tool) to author computational narratives.
  • Your organisation is currently stuck using general knowledge hubs — like Confluence — to store institutional technical knowledge.
  • The technical side of the business want (and need) an easier way to publish their work.
  • You are not sure if business agents around the company are actually data-informed — are they utilising data insights created by the data team in their decision-making? If not, this could be because technical knowledge remains siloed within the business.
  • There is currently no communication happening between different departments and business agents in relation to data and data insights.

Solutions

Some examples of companies and tools that have attempted to solve these issues are:

  • Airbnb’s Knowledge Repo
    Open-source. Originally meant to facilitate the sharing of knowledge between data scientists and other technical roles, but can also be used for sharing content with non-technical members of the team.
  • Azure Notebooks
    Azure Notebooks is a free hosted service to develop and run Jupyter notebooks in the cloud. You can then share these notebooks as links to Markdown or HTML versions of the notebook.
  • Kyso (our solution)
    Kyso is your company’s data insights journal — a central knowledge hub where data scientists can post reports so everyone can learn from and take action on data insights. With support for all the go-to data-science tools like Jupyter and R notebooks, we help businesses exchange dated one-to-one sharing of results for a more comprehensive, unified knowledge management system for all data-based reports.

Final Thoughts

Today, as more and more companies attempt to become more data driven, to scale the ability to make decisions using data, inefficient and ill-suited knowledge hubs slow down communication and the speed at which insights can be put to good use.

By continuing to use legacy solutions that were not built to support current tools used by the technical side of the business, these companies will fail to take advantage of the value their very own data team generates.

To this end, there is a need for more streamlined solutions through which data-based knowledge is easily shared and managed. By centering data-informed discussion and decisions around knowledge repositories specifically designed to support such actions, organisations will reap the benefits and begin to truly turn data science into business impact.

Title Photo by Luke Tanis on Unsplash

--

--

Kyle
The Kyso Blog

CMO & Data Science at Kyso. Feel free to contact me directly at kyle@kyso.io with any issues and/or feedback!