Medium’s Got a Large Problem
Medium is a fashionable choice for content creators who are looking for an easy and accessible way to publish articles.
Although Medium is #152 in terms of popularity among content management systems (CMSs), it is used by 2.4% of the most popular websites, which indicates that it’s the medium of choice for large brands in particular.
At Onely, we too use Medium to publish long-form content, and it has served its purpose well. However, if we had to rely solely on Medium to get our content out there, we would probably choose a different CMS.
Why?
Because of the massive issues that Medium has with getting indexed by Google.
Our data speaks for itself: from a random sample of 1,011 articles extracted from Medium’s sitemap, 166 aren’t indexed.
That’s over 16%!
Managing the Sitemap
The sitemap is one of the primary ways for Google to find new links belonging to a website.
It’s a simple XML file that every website should have, and it contains all (as decided by the webmaster) of the important URLs on that website, as well as some basic additional information.
Googlebot, the crawling algorithm used by Google to map out the web, sets out every day to visit as many pages as it can. As well as following the links it finds in every crawled HTML file, it also consults the sitemap as a reference for which links the website owners consider important for it to crawl, and for Google to index.
When it comes to Medium, their sitemap is… well, it’s a hot mess.
For some reason, every comment made under every article that’s published on Medium exists under a separate URL, and all of those URLs are in their sitemap.
To put it mildly, putting all those additional links in the sitemap is a waste of Googlebot’s time.
A significant part of the resources assigned by Google to crawling Medium is spent crawling through pages that offer little value. Because of that, so many articles that should be prioritized for indexing are waiting in line behind a series of worthless comments.
One of the basic elements of technical SEO is helping business owners prioritize which parts of their pages should be indexed, and making sure that web crawlers reach them in a timely manner.
Watch Out if You’re Using JavaScript
Another problem that Medium has is with the “More from Medium” section — on the very bottom of every article page.
This section contains links to related articles that are generated with JavaScript.
For the user, this is a feature where readers can see the images and then simply click on the link for the article they’re interested in reading next.
For Googlebot, it must perform an extra step and render the JavaScript in order to discover the link.
Unfortunately, this is a step Googlebot is not always willing to take. When there are thousands of other URLs waiting to be visited, the crawler prefers all of the links to be visible in a plain HTML file.
This is one more issue that we can correlate with Medium’s difficulties in getting all of their content indexed by Google.
And it’s not just Medium — most large domains that we’ve tested struggle with similar issues to a varying degree.
Wrapping Up
This article isn’t meant to single out Medium.com.
Because the web is so large, it’s very hard for search engines to keep up with all the content that’s perpetually being released.
Factor in other issues like JavaScript and crawl budget, well, it’s safe to say that even a search engine as ubiquitous as Google has its work cut out for it.
That’s why any website as big and dynamic as Medium needs to take technical SEO into consideration.