On building product at Medium

Medium uses a lot of tools, infrastructure, techniques, and people-power to deliver incredible content to you every day. Behind the scenes are outstanding engineers, designers, product managers, user happiness specialists, and more who craft the simple and elegant Medium experience including content distribution, personalization, and interaction capabilities.

I’m a product manager at Medium and this is a talk that I gave at Canvas Conference in Birmingham in October 2015. You can watch the video below, or skip through to read what we do and view the slides from the presentation.

Note: This information was current as of the talk in October 2015, although it is surely out of date by the time you’re reading this. We move fast at Medium!


What is Medium?

Before we get started, it helps to have some context about what we’re trying to do at Medium.

It’s the place where Bono lays out a Marshall Plan for Africa and Melinda Gates responds. It’s where a gay Catholic, Aaron, discovers his voice, which leads to meeting the pope. It’s where former Amazon employees give Jeff Bezos feedback. It’s where Michael Pollan lays out a food policy for the country, calling on Washington to respond. And it’s where Chelsea Manning can share her life in prison, inspire others, and respond directly from prison — last night. And all of this happened in the last few weeks.


Interactions

You may have seen Susan Crawford write about Uber’s social impact. Medium, at it’s core, provides ways for readers and writers to interact and push ideas forward.

As you scroll through Susan’s story, you’ll see highlights that your network has left. In the margins, you’ll find little asterisks (*). This is where people have responded to a particular part of the story. When you tap on one, see that Kai Sosceles responded and the author recommended it. And at the end, Tim O'Reilly writes a full on response, of which Susan responds to. You can get lost in this web of responses to responses, highlights, and responses to particular passages.

What’s really interesting is how Medium determines which responses to show you. It is all based on your network — so someone you follow has to write or recommend the response. Or the author can whitelist a response by recommending it herself. Here’s those rules:

Everything else is rolled up under “There are 50 responses outside your network.” You can always get to the full list of responses, but we’ve tried to distribute and raise visibility to the ones most relevant to you.

This system has an amazing additional value. It keeps out the trolls that you find on every other commenting system throughout the Internet. By not seeing their responses, you feel like there’s always high quality content on Medium.

And through our response system, we don’t feed the trolls.

It’s just one way we use personalization to tailor your experience and increase the quality of interactions.


Making data-driven product decisions

Product Science is Medium’s data analytics organization. It is in service to product, and the business as a whole. Here’s what happens when you visit Medium:

Basic statistics about your activity is sent back as events to Medium, like what you clicked on or what story you navigated to (this data is not shared). And all of our tables of production data are periodically imported too. This is all stored in a Redshift data warehouse.

Internally we have a tool called “go/sql” where anyone in the company can write SQL queries, like the one below, and get back a set of results to analyze. You can look at the top post returned for this simple query, a story by Daniel Venegas called “Check the Zine” that has the word canvas in it.

Using this data warehouse, we can compute the results of A|B tests. There’s a tool that automatically calculates common statistics for control and experiment groups. It tells you if you have enough samples to detect a difference in the means, and whether the result is statistically significant.

There’s way more powerful things we can do with our product science tools. For example, the stream should always be filled with diverse and compelling stories based on your network. We can chart what the stream looks like for different groups of users, and develop product enhancements to try to move users towards a more engaging stream experience. Here’s an example of two of our user groups:


How we work

Medium has an internal version of Medium that we call Hatch. It is a place where we can communicate and collaborate, using all the Medium tools. We also conduct “highlight polls”, where we can tally how many times people have highlighted each option and count it as a vote.

Every once in a while, posts from Hatch are published in our Inside Medium publication where you can publicly see the inner workings. Other companies have started creating their own “inside” publications on Medium too.

Recall how our product science infrastructure works, where it imports events and tables that we can query. For a “simple” writing platform, Medium is actually fairly complex. You’d think that it is just requesting a record from the database when you visit the site or open the app, but it’s much more. Here’s the sequence of things that can happen on a single request to Medium’s homepage

And it gets even more complicated than that, with lots of third-party tools, monitoring services, and deployment tools. Dan has written a much deeper analysis on how it all works.


Personalization

Top stories for you is at the top of your home stream when you visit Medium on web, iOS, and Android. It is the best stories in your network. To figure out what to put in Top stories, a variety of personalized lists are computed, ranked, and merged to generate just 3 stories.

Stories in your stream also “bubble up”. Normally the stream is just a reverse-chronological listing of stories published or recommended by your network. But what happens when you don’t see a story in your stream and it starts gaining momentum? Maybe more people in your network start recommending it. What we do is “bubble” the story back up to the top of your stream if you haven’t seen it yet. Once you see the story, we fix its position in your stream so it doesn’t change on you.

With Medium 2.0, there’s a new “explore” section in the mobile apps. It consists of many lists of lists that let you discover content outside your network. For many of these lists, their content is computed offline and stored in a RDS database for super quick access when you open the tab in the app.

One of the lists is “Conversation starters,” which has stories with a deep tree of responses. Those responses can be visualized and you can look at their characteristics. In James Richardson’s “My Generation is Just Awful, and Colleges are Making it Worse,” there’s a lot of responses around the main story. Many of these are “me too” responses. But what’s interesting is that a response created it’s own gravity, with responses around it — sometime we call “Top Responses.” And that response led to a back-and-forth conversation between Domenick Yoney and Charley Vu. These characteristics can be visualized, and we can develop algorithms to extract these conversations:


Distribution with tags

Tags were introduced early in 2015 as a way to replace Medium’s concept of channels. Tags are user-generated and can be much more descriptive, fine-grained, and defined by the community. Tags launched with tag pages and the ability to tag your post.

Tags are a content distribution mechanism within Medium. Posts that are tagged are viewed 3 times more than untagged posts on average.

Since the first launch of tags, a lot has changed with tags, including:

The evolution of tag features on Medium

But we had a problem — if tagged posts distribute through the network more effectively than untagged posts, how do we get people to tag their posts when they publish? Only 36% of posts on Medium were tagged a few months ago.

Jacob developed a machine learning solution that suggests tags to you before you publish. It generates a similarity score to other posts, aggregates the tags from those posts, and then picks the top tags that recur in those posts.

The number of posts tagged jumped from 36% to 82%.

It turned out that this tag suggestion model had much more widespread impact than just suggesting tags to authors when they publish. As any successful consumer platform knows, you’ve made it when the spammers hit you.

An interesting thing happened though — the spammers would copy and paste content, and then hit the publish button. They’d accept the default suggested tags. As they published more and more, the model learned that these types of posts were correlated with specific tags and actually reinforced those tags. It meant that spam actually became concentrated in a few select tags on Medium. The adaptive learning of the tag suggestion model made it easy for us to identify and remove spam posts.

Then we had an idea — why not build a parallel version of the tag suggestion service that instead labels posts as “spam” and “not spam.” It could learn what’s spam over time, and automatically flag it in our system.

There’s a lot more I’d like to tell you about Medium and how we work, but my talk at Canvas was limited to 30 minutes. There’s a lot more interesting things to read about how Medium works, including:

This is how we’re building an idea network at Medium, with a level playing field, that moves us all forward.

If you want to work with us to move ideas forward, we’re hiring too.