Documentation as a Data Source

Corey Butler
Metadoc
Published in
6 min readJan 26, 2021

Software is a living ecosystem. It’s composed of code libraries, logic, data streams, and other detailed components that work together.

Documentation is supposed to be a “guide” for navigating this technical ecosystem. However; truly useful documentation is still difficult to produce and consume.

Documentation’s intrinsic value is dependent on the context in which it is consumed. For example, an authoritative book or wall-of-text provides anything and everything about a subject. A tooltip in a code editor will describe options for a function. Which is more useful in the context of programming? The whole book or just the code reference you need?

Most documentation systems generate too much. They often produce entire websites, books, PDF files, etc. They generate everything.

. . . . . . . . What if we generated JSON instead?

By treating documentation as a data source, data can adapt to any context. This is a powerful concept which has been discussed in the tech community before, yet few standards have been adopted to support this approach. This is the challenge we’re addressing with Metadoc.

https://metadoc.io

Thinking About Documentation

The current documentation “landscape” is challenging from a bird’s eye.

Problem 1: Location
Documentation is scattered. People must first be aware of what information they need and where it is located before they can do anything useful with it. Locating good documentation isn’t always a matter of a simple web search. Search engines often provide an overwhelming number of non-discerning results to sift through. How can one really tell which search results contain the information they truly need without reviewing each result?

Problem 2: Inconsistencies
Every software community manages its own documentation. This leads to inconsistent layouts, formats, and content. Consider these screenshots for popular languages and runtimes. There is nothing consistent about the layouts or structure of the information.

Go, Node.js, Rust, Python, Java, C++ Websites

This type of inconsistency may remind experienced developers of the times before Git and Subversion. Software was distributed in many different ways, saved in many different places, and was still very much the “wild west” of development. Documentation is a new frontier in need of the same attention software creation received.

Documentation websites differ from the times when books were the main mechanism for learning. A book had a preface, index, etc. They all followed a series of writing standards. Websites should, but rarely do, follow such standards.

The inconsistent and vague nature of documentation across/within communities creates confusion. People have to learn “the way to learn” before any information of real value is communicated. This is exacerbated when “the way” is very foreign or doesn’t make sense to someone. It’s a common reason why some developers are so resistant to learning new technology. Statements like, “I don’t want to have to learn another entirely new way of programming,” are far too common.

Problem 3: Information Decay
Documentation goes stale quickly. Software authors often get caught up fixing bugs and building features. Meanwhile, the documentation isn’t updated or falls behind.

Oftentimes developers don’t update documentation because their code doesn’t change. The false perception that “no code changes” equates to “no change at all” ignores outside factors. This is a potent problem in web development where browsers are updated once every six weeks - even if developers don’t update their code.

This usually leads to misleading documentation. The only thing worse than no documentation is misleading documentation.

Root Cause
The problems are fairly well known, but often under-analyzed. The widespread belief that documentation should be a single process is failing.

Developers often generate documentation from code, which can be a valiant but futile attempt to prevent decay. We sympathize with these folks. We are these folks! Sometimes documentation manifests as a website. Other times it is a PDF or README file. Sometimes it’s even an eBook. Each of these are consumable static products.

Consumable products are built in mental silos for a particular purpose. This is how inconsistency is created. Each “product” is maintained separately.

An Alternative
Instead of producing consumable products, consider a world where documentation is a data source.

Data differs from information. On its own, data is meaningless. Data with context is meaningful. For example, 42. That’s data, but what does it mean? Is it someone’s age? A speed limit? The answer to the ultimate question? By adding a label like “42 years old”, context is established.

The goal of documentation is to provide valuable information. However; there is no rule stating data and context must always be produced and maintained together. Separating these processes yields significant benefits.

Separation of Concerns

Data sources are semi-consumable, yet flexible. They can be aggregated in a single location. Generating documentation products (like websites) already require internal data generation, plus the creation of a UI or alternative visualization. By stripping it down to data generation only, the process is simpler and easier to automate/maintain. Finally, data sources can be used to produce consistent interfaces. Think of it as “rehydrating” the data.

In our own experiments, we simply asked, “what if we generated JSON?”

Partial snippet produced by Metadoc for go-fsutil library.

From the code above, we have all the data necessary to add context and visualize in an intuitive user interface.

— Video Coming Soon —

The Power of Metadata

At Metadoc, we’re building the best developer experience we can imagine. Our goal is to document all code using a consistent and beautiful interface while adding useful features like notes, bookmarks, and cross-references. There are many other ways the documentation metadata can be used.

For example, the Metadoc API could be used within an integrated development (IDE) coding environment for providing context-relevant detail. It could be used within DevOps processes to validate code. Obtaining information without screen scraping a documentation portal is yet another use case. There are many use cases where a documentation API could bring value.

We’ve gone so far as to build a static API, enabling ourselves and the developer community to create more useful learning resources.

Metadoc

Our exploration of the documentation landscape, combined with our desire to see a more data-oriented approach to documentation, led to a new way of thinking about documentation and its purpose.

Metadoc offers tooling to automatically generate significant portions of documentation metadata (JSON). It also provides tools authors consistently use to produce effective learning materials. Reducing these traditionally burdensome tasks encourages creation.

Margin notes = meaningful context

Developers can share insights, tips, mini-lessons, and other impactful content. This is done through comments, highlights, bookmarks, and notes in the margin. We envision a world where software creators share these notes with others, creating a knowledge cascade. It is a way for developers to teach without being a teacher while students learn as they work.

If Metadoc succeeds in its mission, it will effectively help the global development community transition knowledge from one generation to another. Developers will build upon legacies of know-how instead of mere lines of code. It is a future we are excited to help shape and be a part of.

If you are interested in contributing to this future, please consider contributing through Github Sponsors or reaching out at https://metadoc.io.

--

--

Corey Butler
Metadoc

I build communities, companies, and code. Cofounder of Author Software, Inc. Created Fenix Web Server & NVM for Windows.