How we did it: Getting to Structured Content

Published in

Known Item

6 min readApr 8, 2020

Inspired by a question on Twitter, I want to provide an example of the power of structured content as we have leveraged it on Docs.microsoft.com.

First, a little background:

Microsoft Docs is the successor to MSDN, Technet, etc. and is the platform on which Microsoft is centralizing (nearly) all of its technical documentation, training, and learning content. We use GitHub as our back end, and nearly every document is stored as a markdown file. Most of the site follows a pretty old-school documentation pattern of articles stored in tables of contents. In 2017, the team began work on a new interactive learning platform which would become Microsoft Learn.

(Un)structure:

We had had some interactive tutorials on the site, but they had interactivity without much structure presented to the user:

They weren’t cutting it for a few reasons:

The interactivity didn’t add much value. Most of what it does is break up the content so it can’t all be seen at once and give some green checkmarks.

They were disconnected from other content. The tutorial linked above mentions in its text that it’s a good idea to have taken the Meetings in Teams tutorial before this one, but that relationship exists only in natural language and we can’t expose it in any other way.

They lacked semantic value. It isn’t clear what this thing is from looking at it, what its component parts are, or how it differs from everything else on the site.

When we started talking about launching Microsoft Learn, we considered using this interactive tutorial format we already had, but decided against it because of these limitations. Instead, we decided to introduce a new suite of content types that would have more valuable interactivity, clear relationships with other content, and legible semantic value.

Getting to structure:

After the business strategy document, we started with the content model. To get our heads around the problem to solve, we did a teardown of the content models of a few competitors and figured out the minimum number of content types and relationships we would need for the experience we had in mind. We then presented this MVP content model to an audience of product managers, designers, and engineers. This initial information architecture work took less than a week of effort and was implemented largely as is.

The v1 content model for Microsoft Learn

Reverse engineering an existing content model is always worth the time it takes, because it helps you see the design choices that will help something scale and function over time. For instance, it helped us see that we needed to keep the achievements we were awarding to users at the same level as the “remixable” piece in our content model. If we allowed authors to reuse units but only awarded achievements for modules, users would have to go through the same units repeatedly, which isn’t a good use of their time. Similarly, we realized that we should have a single “achievement” content type which is related to the content, rather than having the badge directly on the content. This gives us flexibility in the future to introduce different kinds of achievements or award achievements for actions other than completing content. This is a minimal additional engineering investment that keeps us from building ourselves into a corner.

But what about authoring?

Interestingly, there wasn’t that much of a difference in the way the content was authored:

Both “unstructured” (left) and “structured” (right) content experiences are written in YAML.

Both kinds of content are written in YAML; they even have a lot of the same metadata. The main differences are:

We have named the different content elements of the structure. Through close partnership among our teams, we’ve made the language as clear and ubiquitous as possible. In this case, instead of generic “content” and “items,” a module has an “abstract” and “units.”

There’s a lot more back-end work generating and displaying relationships between content. The engineering team developed several services that manage and publish the hierarchy of content types, the achievements they’re associated with, and the user’s progression through them. They were able to build these services, in part, because of how strongly structured the content is and the semantic value each of its parts has. It provides a foundation to build upon.

The design leverages the structure. Because we were able to develop the content model early, the visual and interaction design of the site was able to use it as a starting point. We know what our content types are and can express it to users through the design, helping them understand where they are and what’s going on at every moment.

There are some downsides to deploying this level of structure at this scale, and the burden has largely fallen on the authors. Specifically, because we use GitHub as our back end, folders and files need to be named and arranged precisely in order for the system to infer the proper hierarchies and display the content correctly. We’re able to validate this so incorrect content doesn’t get published, but it is onerous and frustrating for authors.

The consolations of structured content:

We’ve been through a few design iterations, and a module on Microsoft Learn currently looks like this:

A module as represented in the content model and as currently rendered on the site.

Having structured elements in the content model lets us display information in a predictable way.

This more structured experience was better for us and for customers in several ways:

It’s durable. We shipped the alpha experience based on this content model in April 2018, and it’s only undergone small revisions in the past two years. Since this is only one of a dozen sites our engineering team is responsible for and a small portion of the overall content our design and PM teams handle, we can’t be making constant IA changes. It also means our content teams can set up durable processes to create this content on an ongoing basis.

It’s scalable. It’s grown well for us, scaling from 20 modules in 2018 to nearly 1000 now. We look forward to making authoring improvements, but the basic user-facing structure is still holding up. It also helps us show the same content in multiple contexts without introducing inconsistency, which is essential as the catalog grows.

The same structured elements from the content model are also displayed on cards that can be browsed and filtered.

It’s modular. We have added additional content types to the site, as its success has helped it grow and more teams want to get on board. If you go to Microsoft Learn these days, you’ll also see information about certifications and exams to get those certifications. We’ve been able to add on to the initial content model modularly, rather than needing to wholesale replace it, even as the intention and audience of the site has shifted.

It lets us build things people want. It’s enabled us to ship new features more easily, like a catalog API which lets partners display the content catalog in their own learning management systems. Many people have requested this for the regular Microsoft Docs content, but it hasn’t yet been feasible because of the lack of structure in that content.

The content model structure as expressed in the Microsoft Learn Catalog API.

It’s clear. The limited and explicit relationships between content types makes it easy for users to understand where they are and where they can go. When users are supposed to be learning complicated concepts, it’s essential that the site stays out of the way as much as possible.

While we still have a huge backlog of improvements to make, the platform has been successful with customers, and we get great feedback every time we participate in an event. I hasten to add, this is not because of the information architecture, but this is a great example of the information architecture doing its job and getting out of the way.