How to Work Effectively with Google Search Console Data to Analyze Google Updates
Google Search Console has come a long way since its early days, when it was still called Google Webmaster Tools. With the ever growing number of “not provided” searches, in May of 2015, Google not only renamed Webmaster Tools to Search Console but also started showing more precise data in the Search Analytics report. It then became possible to break down your site’s search data and filter it in many different ways in order to analyze it more precisely.
A couple of months later, Google introduced the Search Analytics API to enable integration of this valuable data into custom dashboards and tools. Just a few weeks later, we released a whole tool around GSC Search Analytics data, called “Impact”, now Ryte Search Success. So we have more than three years of experience with the Search Analytics API and over 100,000 Ryte users have already connected their Google Search Console accounts. We really learned a lot within these three years.
In January, Google launched the new Google Search Console and also revamped Search Analytics, now the Search Performance report. In doing so, it gave access to more data than ever before, showing 16 months of data instead of just 90 days.
We @ Ryte are huge fans of the Search Performance data Google provides within Search Console, but there are still lots of SEOs who prefer evaluating their search performance with 3rd party data, which consists of scraped Google SERPs. I cannot understand SEOs who would rather trust 3rd party data than the real Google data provided in Search Console.
Google’s fight against SERP scrapers
Google doesn’t like tools scraping their results. It’s against Google’s terms & conditions. Lately, Google has been pretty vocal condemning SEO tools that use scraped Google data, even going as far as calling them “blackhat scrapers”:
Popular SEO blog Search Engine Journal was also called out by John for publishing a guide on how to break Google’s terms of service and webmaster guidelines:
Bots scraping Google results is a big issue for Google. In July 2016, Gary Illyes revealed in an interview with Woj Kwasi that 70% of all search queries are coming from bots! That’s costing Google billions of dollars for search queries they don’t even want to answer.
Google has always been active against webmasters who are using automated queries to scrape the results. There’s a funny story in the excellent book “I’m Feeling Lucky: The Confessions of Google Employee Number 59”. The story is about Ray Sidney, who used to be “Google’s first line of defense against webmasters who pummeled Google with automated queries.”
“Webmasters and SEOs wanted to make sure their sites showed up near the top of Google results and so used monitoring software to conduct repeated automatic searches for keywords important to them. In periods of high volume, automated queries slowed down Google for everyone, which is why we considered them a violation of our terms of service. […]
Ray took unauthorized automated queries very personally. If he could figure out the spammer’s email address, he sent a terse cease-and-desist warning. If he couldn’t find an email address, he blocked the spammer’s IP (internet protocol) address — the unique number assigned to a computer connected to the Internet — from accessing Google altogether. […]
If Ray couldn’t identify a specific IP address, he contacted the spammer’s Internet service provider (ISP) and asked that they track down the offender themselves and sever his access to Google. If the ISP refused to play along, Ray upped the ante — he blocked access to Google from all of the ISP’s addresses. That usually got their attention. It was how Ray shut down access to Google for most of France. The French ISP definitely noticed.”
With billions of dollars at stake, I expect Google to become even more aggressive when it comes to shutting down tools and service providers which are scraping Google results.
I’m not the only SEO noticing this. In a spectacular article, old-school SEO legend Glen Allsopp (also known by his moniker ViperChill) predicts that Google will soon limit prominent SEO software providers from crawling Google search results. Glen has a hunch “that Google will start to crack down on the most popular companies which profit from tracking their search results.”
SEO tool providers are already feeling Google breathing down their necks. In July, a well known SEO tool sent out an email informing their customers that they are decreasing “the frequency of keyword updates in Rank Tracker” because “it’s getting increasingly harder to get SERPs data for large volume of keywords and our data partners (we have quite a few) have reported experiencing a major downfall recently.”
This well known SEO tool provider is far from being the only one experiencing this strong headwind, but they’ve definitely been the first to make this public. Google has already shut down several scraped data providers and will continue to to so. This is especially tough on cheaper tools since the cost of aggregating ranking data is going up exponentially with service providers having to shut down their SERP scraping services.
Of course, SEO tool providers that rely on scraped Google data are hoping for a “Google SERP API,” so basically getting Google search results via an API. I think this is never going to happen, mainly because SERPs are so dynamic nowadays as so many different ranking and UX experiments are happening with almost every single search. In their manifest “Improving Search for the next 20 years”, Google says “Last year alone, we ran more than 200,000 experiments that resulted in 2,400+ changes to search. Search will serve you better today than it did yesterday, and even better tomorrow.”
With this amount of fluctuation, it would be very hard to deliver “the real result” for a certain query.
Analyzing the difference between real and scraped data
Due to the everchanging SERPs, data quality is already going downhill. However, few people notice as most don’t compare the results of their scraped data tool of choice with the real Google data provided via Search Console.
I regularly compare the data, and the results are disappointing. Here’s a comparison for a medium-sized blog of mine:
As you can see, the best SEO tool still only has 14.58% exact matches with an average deviation of 6 (so a position no.2 could just as well be position no.8. A top ranked result could also be a no.7 result and so on). The worst tool gets it right in less than 8% of all keywords with an average deviation of 8.17 — meaning you’d be better off guessing a ranking position.
Depending on the tool you use, you have either ok’ish data:
Or bad data that seems a bit random:
Of course, data is getting more accurate the smaller the domain — f.e. here’s a very keyword-focused SEO-affiliate site:
On the other hand, the bigger the domain gets, the more keywords it ranks for, therefore the less accurate the data is going to be.
One of the aspects factoring into these poor results is the number of keywords scraped data SEO tools use for their visibility indices. We did a little study with a set of sample sites comparing the number of keywords you get via Search Console versus the keywords SEO tools are using, and the results are staggering:
With Search Console you are likely to get at least 5x, if not even 10x, the amount of keywords! Furthermore, it looks like different SEO tool providers are tracking my site with keywords that don’t even have a single impression, hence them missing in the GSC keyword set. So I continued comparing all the different keywords sets used by the major SEO tool providers:
Out of almost 290,000 keywords provided by Search Console, only 2,199 of these, so just 0.75% (!), are reported by all of the SEO tools. It really becomes apparent that you’re getting a completely different picture of your site’s search performance depending on which SEO tool you’re using!
If you want to do competitive analysis by checking out the rankings of a competitor’s domain, you have no other choice than to rely on these kinds of SEO tools. Of course, I’d rather use flawed data than no data at all. Since all these flaws hold true for all domains, I at least get a glimpse of a competitor’s search performance. Just recently, Juan from Sistrix published a great article on how scraped data can be used for competitive research.
However, if you’re analyzing your own domain, with access to all that juicy search analytics data in Search Console, there’s just no good reason to use 3rd party data!
How to analyze the real impact of Google updates
Google Search Console has become a lot better and easier to use, but it still lacks a lot of things we SEOs are used to from scraped data SEO tools. Fellow SEO expert Aleyda Solis wrote a great article about “Using Google Data Studio for a more actionable Google Search Console Performance Dashboard” to cope with the current limitations of Google Search Console.
If you don’t want to do this yourself with Google Data Studio, you can also use Ryte Search Success, which, for example, offers keyword monitoring based on Search Console data. This means you can finally ditch your 3rd party keyword monitoring and start monitoring your most important keywords with real Google data.
This is most useful when you’re trying to quickly analyze the real impact of a Google update you’ve been hit by. This was the case with our Spanish subdomain, es.ryte.com. On August 13th, the subdomain was hit by the so-called “Medic Update”:
I have to admit, checking Sistrix on that particular Monday was a hard blow. While I don’t really care about our visibility index myself, I know a lot of other people judge our SEO by looking at this graph. So I was in desperate need to quickly find out what really happened?
Logging into Search Console, I can see the real Google visibility of our Spanish subdomain.
If you superimpose the Sistrix visibility index it seems to be congruent with the real Google visibility:
Of course, the amplitude is a bit more extreme, but keep in mind that Sistrix is calculating a “general” visibility, whereas the real Google visibility is based on real search behavior, which can vary a lot on a day-to-day basis.
If I really want to find out what happened with this Google update, it doesn’t make much sense to just look at visibility. It’s more important to look at the amount of clicks generated. For the most part, an increased number of impressions goes hand in hand with more earned clicks, but there was one time frame where this usual behavior was way off:
I knew exactly where to dig deeper.
With the help of Search Success’s “Changes” report, I compared the search performance from one month before the update with the month after. The results were astounding:
Sistrix was right, our Spanish subdomain did indeed suffer a pretty big hit in regards to visibility, but while we lost about 10% of visibility, we gained 33% more clicks in the same time frame! That’s incredible and I sure didn’t expect this at all, so I was poised to dig even deeper.
I sorted the clicks column by trend to find all keywords that gained new clicks within this time frame:
Take for example the keyword “targeting”:
You can really see how an improved CTR makes a page soar in the rankings.
Still, I wanted to find out where I lost the most impressions and therefore visibility, so I sorted by the negative trend:
The lost impressions for our Spanish wiki page about Ascii code alone was quite a significant chunk of all lost impressions:
Looking at the top 18 keywords with the highest impression loss, it became apparent that we only lost rankings for keywords where no one clicked through to our site. So well deserved, why would Google keep a page in the top 10 when nobody clicks on it?
The top 18 keywords were responsible for over 110,000 lost impressions:
but only 294 (!) clicks (of which 171 clicks were lost by the Ascii code wiki page):
The same holds true for our English subdomain, en.ryte.com, which also got hit pretty hard with the Google Medic update:
Again, the only rankings we lost were keywords with zero CTR.
Let’s roll it back?
Fortunately, roughly two months later the visibility index shot up again:
One might think this could be a rollback of the earlier 1st Google Medic update, but since all the keywords I lost rankings for had an abysmal CTR, it just wouldn’t make sense to rollback the update and reinstate pages which clearly didn’t entice users to click through. Using the “Changes” report again, I was able to see that the keywords with the most gain in impressions were totally different from the ones that lost visibility during the 1st Medic Update.
In fact, it seems there was something else being “rollbacked.”
All of those brand / navigational queries we’ve been increasingly ranking for ever since mid-June. So Medic 1 was connected to an earlier uplift in impressions. If I check out the keywords which gained the most visibility — surprise surprise:
Seven different Codigo Ascii variations within the top 10 results :)
Now impressions and clicks run synchronously again.
Medic 2 doesn’t really look like a specific update to me, it’s just the website ranking better and better over time. The “high impressions” phase, which ended with Medic 1 seems to have been a test, which has now been adjusted starting with Medic 2.
We also gained a couple new position zero results like this one:
With the help of our new advanced filters in Search Success, you can easily filter for keywords that have a CTR higher than 40%, which in my experience, is indicative for a new position zero result:
Medic Update in reverse?
While our Ryte.com subdomains got hit with the 1st Medic Update and rose up again with the 2nd Medic Update, there are also a lot of domains which showed the exact opposite behavior. For example, the german flagship magazine for all things digital t3n, which is also one of the largest independent publishers in Germany and a beacon for tech journalism.
When the 1st Medic Update hit, t3n.de gained a lot of visibility, but the visibility went down again with the 2nd Medic Update:
Although t3n.de displayed the exact opposite behavior, the cause of the visibility drop was exactly the same as our subdomains — it was mostly brand keywords with an abysmal or even zero CTR:
The top 14 keywords that suffered the highest loss in visibility accounted for an overall of 2,700,000 lost SERP impressions:
These lost rankings only accounted for roughly 5,000 clicks (the largest chunk also seemed to be a news article about kinox.to, which obviously lost impressions when the news passed):
Search Console performance data to the rescue
What looked like a major update and overhaul of the SERPs was purely just another adjustment of SERP quality by Google. When you don’t get clicked, you’re out of the top 10.
Of course, it’s not just about the CTR — Google is looking for results that generate long clicks. A long click is a very important quality metric for Google, opposed to short clicks with users quickly jumping back to the search results. It’s not just about luring someone to click through to your site — you have to keep the promise you make with your snippet.
Most importantly, don’t worry about visibility. The more efficient your site is, the better.
This also uses a charm against Panda. Make sure that people who click through get the best possible user experience, and that their exact search need is met. In the end, this is what will make your site shoot to the top — and stay there.
Good SEO might get you into the top 10, but all you get is a chance: a chance to prove you’re top 10 material and worthy of a permanent stay. So when you get the chance, don’t suck.
Originally published at Ryte Magazine.