Why Internal Technical Documentation is Hard
Raise your hand if you’re not happy with your company’s internal technical documentation.
You are not alone. I have been working on internal documentation at a large tech company for three years. and I have never met anyone satisfied with their company’s solution. This includes people at companies with excellent external documentation. It’s flabbergasting.
What’s more, no one seems to know what the problem is. I’ve been to documentation conferences where the speakers described solutions that my company was trying to replace. Other speakers said, “they didn’t know what the answer was, but they were trying some stuff and…maybe it would work?”
Indeed, people are trying. For example, Etsy recently published an article on immutable documentation. The described fyi tool was so inspiring, one of our engineers implemented it within two weeks. Many at my company got excited; I was not one of them. To be clear, I don’t think the FYI tool is a bad idea. It’s an elegant solution to a specific problem — but that’s the problem. We have so many elegant solutions to specific problems that we have new problems caused by having too many solutions.
There are so many elegant solutions available that we think we can solve internal documentation by picking one, or a few, of them. In the excitement over the FYI tool I described above, I saw the start of a pattern that I’ve now seen many times. Here is the pattern:
- Documentation is bad.
- A person who cares about documentation notices that internal docs are bad. I call this person a Documentation Evangelist.
- That person proposes and implements a specific elegant documentation solution. The solution might be:
- In-Repo. After all, it’s easier to maintain quality, up-to-date docs if you have to get them reviewed. Plus, more people will write them because they can do it in the IDE and repo they’re already working in!
- WYSIWYG Wiki. After all, it’s easier to maintain quality, up-to-date docs if anyone can make changes inline. Plus, more people will write docs because it’s so easy to create them!
- A Question & Answer Tool. After all, it’s easier to maintain quality, up-to-date docs when you only answer questions. Plus, more people will write docs because they don’t actually have to write a doc! They only have to answer a question!
- Some Custom Solution. After all, it’s easier to maintain quality, up-to-date docs when we have full control of everything.
- Some documentation is good, but not for the reason given! The main reason docs improve is because the Documentation Evangelist cares and is writing good docs.
- The Documentation Evangelist switches teams, leaves the company, or otherwise stops keeping docs up to date.
- Documentation is bad. Blame the platform.
- Repeat, but with a different Documentation Evangelist and solution.
I don’t have anything against Documentation Evangelists. After all, I am one. It’s great when engineers care about documentation. But these individual efforts never include high-level planning. There is a sense that internal documentation doesn’t need high-level engineering investment. There is a general lack of understanding, at all levels, about why internal technical documentation is so hard.
So… why is internal technical documentation so hard?
I want to set the context. I have been saying “technical documentation”, but that’s a subset of “technical knowledge.” In this article, knowledge is any information that helps a human to use, develop, administrate, and/or operate technology.
This encompasses more than you’d think.
For example, say you had a product intended for internal use. You need information for people who use, develop, administrate, and/or operate it. Here are some questions they might have that you’d need a solution for:
What should I use? Why should I use this? What problems does this solve?
Wikis, Blogging Platforms (ie. Confluence, Google Docs, OMG why won’t Medium make an enterprise solution lol, etc), Tech Talks / Recorded Presentations, Slide Decks
How do I get started?
This is where a lot of people create custom tooling. Sometimes an example is enough. Usually, you want examples interspersed with prose explaining what is going on.
Wikis and blogging platforms are also sometimes used for this.
How can I learn more about the underlying technologies?
Learning Management System (Absorb, etc)
What is everything this can do?
Generated API / Reference docs (pydocs, javadocs, godocs, etc)
How is it implemented? Are there examples of using (API / code)?
Sourcegraph or another code search solution.
I have a question about (specific thing)? Can I do (specific thing)? How?
Stack Overflow, Etsy’s FYI tool
Why doesn’t X work? What doesn’t work? What’s coming soon?
Bug/Ticket/Issue Tracker (Jira, Phabricator, etc)
Who owns this?
Some sort of custom internal ownership or on-call platform.
How has this changed? How is this changing?
Wiki platforms or static docs for migration guides, Google Groups or other newsletter announcements
How well is X working?
Logging / Metrics / Alerts / Dashboards
How do I fix (specific user problem)?
Wiki — Cookbooks?
How do I fix (specific technical problem)?
Wiki — Runbook
How do I start working on this?
Developer README (environment setup, testing, deploying)
How has this changed?
In-Code Changelist, Git Commit History
How was this made?
Google Doc or Wiki Page for engineering requirements, Google Slides for product requirements, Lucidchart for design docs,
What problems has this had?
Google Docs or Wikis for post mortems and incident reports
Wow, eh? That’s a lot, and this is one type of thing that you’d need to track knowledge about. We can make this harder. Many of these solutions have different hard requirements.
Let’s narrow the scope to four different documentation types:
Getting Started Guides — Communicate complex topics that must be understandable by humans. With WYSIWYG collaborative editors you can quickly review and iterate drafts. I wrote this article in Google Docs for that reason.
API Docs / Technical Examples — Must be accurate. Should only ever be generated from the source code.
Getting Started — Should be able to run the example. If you have docs for UI components, you should see the UI components.
Runbooks — Must have 100% uptime — or, you know, at least more available than the services they support.
That is only a subset of requirements and no current off-the-shelf product meets them all.
Many products mentioned above are elegant solutions with specific use cases. SourceGraph immediately makes your codebase more navigable and is always up-to-date. StackOverflow is a battle-tested Q&A tool. Google Docs is the best collaborative authoring tool that exists. The custom doc generation script you wrote makes your team’s docs look exactly like you want them. Each of these individually adds immediate value, and you need to fix your docs immediately. You have tremendous pressure to use one (or more) of these solutions, so you end up with a fragmented solution.
So… what is the problem with fragmentation?
Some think fragmentation doesn’t matter as long as everything is discoverable. If you solve discoverability, you’ll find that it is only one of the five non-trivial concerns that you needed to address.
You don’t know how to find content. This is the most obvious problem, and this doesn’t only mean search (More info provided later in the article).
Fragmentation means that you have to ingest every platform. You need to ensure accurate search metadata for every platform. This is a non-trivial amount of work to set-up, optimize, and maintain.
2. Authorship / Creation
You don’t know where to create content.
Fragmentation creates confusion and stress about what documents should go where. This increases the barriers to creating documentation. Also, the same group of people will author and maintain content for a topic. Supporting content on multiple platforms will degrade the quality docs on each of them — authors only have so much time.
Like any service, you need to be able to ensure your documentation platform is working as it should be. You need metrics for analyzing user behavior and alerts for when things break.
Fragmentation increases the operational support load. You have to set up and maintain metrics/alerts for each platform. Many out-of-box solutions will not easily integrate into your company’s infrastructure. You also have to maintain documentation on how to use each documentation platform. Meta.
4. Quality (Ratings / Feedback)
Documentation quality is notoriously difficult to track. It is not as simple as running an A/B test on button colors to see what increases the click-through rate. (More info provided later in the article). All documentation systems should support inline questions, comments, and/or star ratings.
Fragmentation leaves your two options. You can integrate your feedback tool into each platform. Or, you can increase the number of places your team needs to track quality issues and feedback. You might have to track Confluence Comments, Google Docs Comments, Stack Overflow Questions, Google Groups, an FYI tool, email, and chat. Your generated docs and wiki pages probably have no way to report quality issues.
Docs need team-level owners. Individuals switch teams and leave the company. Without owners, you don’t know where ratings and feedback should go. You can’t create a feedback loop of “comment on doc” to “task for stakeholder” to hold teams accountable.
Fragmentation means that you need to do integrate with each platform’s ownership model. You’re writing tooling to trigger a warning when a published Google Doc isn’t owned by a team. You’re customizing the Confluence ACL so that it uses your LDAP or AD model. Either that, or you don’t know who owns your docs, and they degrade.
More info, provided
I promised I’d dive deeper into discovery and quality. Well, I’m keeping my promises. I’ll even throw in another section I didn’t know where else to put about content platform types.
Discovery has become synonymous with search. For internal docs, I argue it shouldn’t be.
One of the reasons external docs are easier to solve than internal docs is that you publish external docs. Publishing is like having a docs whitelist. You only publish one set of docs, and they’re on your domain, so you have authority. Since it’s on the internet, Google handles discovery. Users will search for your docs, find the docs on your domain, and know they are in the right place.
For internal docs, every wiki page, whether it’s personal, team-level, or otherwise is “Published” by default. An active team wiki will almost always be edited more often than a User Guide, polluting search result rankings based on freshness.
Search is a hard problem to solve for internal docs. It’s so hard, that Google’s internal search has a feature to manually recommend a search result for a search term. These manual results appear above the natural search results. This is Google. The company whose name is synonymous with “search” uses manual link-curation for internal discovery.
Fortunately, you don’t have to solve the search problem. Google-style search is designed to impose order on the most chaotic and open information platform in the world. There are many content sources, new things are added all the time, there is duplicated content, and old pages don’t get deleted.
Your internal documentation platform is a closed system, like Wikipedia. The Wikimedia Foundation can delete, edit, and organize all content however it likes. Your company has the same control over its internal docs. Wikipedia imposes order by only allowing one page per topic. As a reader, you know you’re on the right page because there is only one page. If two topics have similar names, you see clear disambiguation. Search is now a fuzzy key lookup.
With this authoritative system, authors have confidence that they are creating and editing content in the right place. It’s the only place. Readers have confidence that they have found the correct document. If information is missing or out of date, they know it’s not a failure of discovery, it’s a failure of content.
Do not take this as advice to spin up an enterprise version of Wikipedia. Wikipedia is still only a wiki platform. Your “Authoritative Discovery Engine” needs to support other types of documents. Additionally, Wikipedia has an army of constant gardeners auditing, pruning, and rearranging content. It is effective on the open web. It will be inefficient for internal documentation and probably become disorganized. Your main takeaway is the “one topic, one source” rule. Figure out how to implement that across your different document types.
Further Note. This does not mean that there is no place for search. Especially when finding content that exists on a page, like specific error messages. Your authority engine can boost these search results.
Now people can find the documents they are looking for. Time to worry about how good those documents are.
Tech companies have a weird attitude toward writing.
I would bet money that you don’t give writing exercises during engineering interviews. I’d also bet money that you think documentation is everyone’s responsibility.
Writing is hard; it requires skill and practice. Most of your engineers have little to no practice. Of the 115 credits in my software engineering degree, two were for a course in Technical Writing — I have no other formal writing training. Professional writers have their own four-year degrees and/or years of experience.
Many of your engineers will write bad docs.
But, how can you tell which docs are bad so you can iterate on them? Tech companies want to measure things automatically. For documentation, you are trying to measure the effectiveness of prose at changing long-term human behavior on a different platform. There are way too many conflating factors to do this scientifically. Most solutions end up relying on heuristics like “document age”. These are incomplete quality indicators. You could incorporate something like Grammarly or HemingwayApp, but those only process grammar. Ensuring good grammar is valuable, but your doc can still have a host of problems. Missing/wrong information, bad structure, misleading diagrams, and be full of non-informative buzzwords.
However, if you are trying to learn something from a document, you know right away how good it is. You either get the information you need, or you don’t. Or you do, but it’s kind of confusing. Or you thought you did, but it was out of date.
A documentation platform needs to make it super easy to capture reader feedback. You’re not going to get that information any other way.
The careful zone.
Many elegant specific solutions do add value, but certain platforms have non-obvious drawbacks. You may have a good use case, just be careful.
Learning Management Systems (LMS)
The first is a Learning Management System (LMS). Online curriculum takes time and skill to produce. Internal technical content changes rapidly. Your curriculum developers will need to constantly maintain each course. I recommend you don’t use an LMS for technical content unless it’s as stable as “Intro to Golang”.
Side note. The best ideas I’ve heard about designing curriculum for a fast moving environment were from Jen Gilbert during her Teach Like an Engineer talk. I’ll let her write that article.
Video, Audio, and other expensive content
Be careful with content that is expensive to produce, or duplicates information. If you have docs, tutorials, slides, and videos that all contain the same information, you have to maintain all those sources. Think about maintenance before adding content.
Confluence was released fifteen years ago and remains the only big player in the internal documentation space. As such, there will definitely be a point when you consider using it as your solution. It even got a facelift in 2018, and it’s collaboration capabilities are approaching the quality of Google Docs. So what do you have to be careful of? What do you have to look out for?
It doesn’t have a solution for generated docs, which is necessary for engineering documentation. This wouldn’t be a problem if it was easy to extend, but it’s not. Confluence plugin development has a steep learning curve. Its unique stack may be incompatible with your company’s engineering skill set or infrastructure.
You can try to get around developing for Confluence by incorporating it as one of many solutions. If you use Confluence for wikis and host other content elsewhere, you’ll lose many of the benefits of a closed system. You’ll start seeing the fragmentation problems described above. For example, Confluence search won’t index your off-platform docs. You’ll have to do complex integrations anyways.
Confluence is currently the best out-of-the-box option, but it doesn’t do everything. If you want it to, you’ll need a team of engineers with a specific skill to make it suit any technical use case. At that point, you should wonder if it would be cheaper in the long run to build something that runs on your own stack.
Standalone Question & Answer platforms
This is a tough one. StackOverflow has been a tremendous boon to my personal productivity over the years. But, a Q&A platform can end up degrading the health of your internal documentation ecosystem, while seeming like it’s adding value.
To be fair, Q&A platforms do add immediate value. First of all, the closed system of the Q&A platform will have better search out of the box than any custom search you build. Also, it takes less time to answer a question once on a Q&A platform then it does to write a document or answer it 100 times in chat. But, this value can be offset in the long run by maintenance costs. On the open web, Stack Overflow has a huge community of engaged moderators to update answers when they change. Internally, your Q&A moderators for a topic will be the same people who are also writing the docs for that topic. Time spent doing one can detract from the other.
This is not unique to Q&A platforms, all documentation requires maintenance. But Q&A platforms can become as heavy an operational burden as a poorly organized wiki. Here are other things to watch out for:
- They can collect content that belongs on other platforms (ie. “What does this function do?” Should be addressed by code comments or generated API docs. )
- They can become a catch-all for discoverability issues (ie. “Where can I find more info on X?”)
- They can be used to report bugs (“Why doesn’t X work?”).
The above problems all confuse the source of truth for content seekers and add bookkeeping burdens to content authors.
I do think Q&A systems can be powerful and useful. I’d like to see a solution that augments existing documentation, rather than one that duplicates or distracts from it.
So… what do I do now?
I know there aren’t a lot of solutions here, but the good news is that no one really has the solutions yet. By reading and understanding this article, you have taken the fast lane to the 99th percentile of people in the internal documentation problem space. Hit me up with my Thought Leader Certificate and 1% of your new documentation startup.