Google Explains Reasons for Crawled but Not Indexed Errors

Published in

Hireawriter

3 min readJul 4, 2024

Google’s Gary Illyes recently provided insights into the dreaded “crawled but not indexed” errors during an interview at the SERP Conf 2024 conference in Bulgaria. This issue, which refers to pages crawling by Google but not indexing, has puzzled many webmasters. Illyes offered multiple reasons that can help debug and fix this error.

Background

The interview occurred in May, but the video went underreported until Olesia Korobka (@Giridja) highlighted it in a recent Facebook post. Despite the interview occurring months ago, the information remains timely and useful.

Reasons for Crawled but Not Indexed

“Crawled but not indexed” is an error report in the Google Search Console Page Indexing report, indicating that a page was crawled by Google but not indexed. During the interview, someone asked:

“Can crawled but not indexed be a result of a page being too similar to other stuff already indexed? So is Google suggesting there is enough other stuff already and your stuff is not unique enough?”

Gary Illyes confirmed that similarity to existing content could be one reason but noted several other potential causes.

Duplicate Content and Other Causes

Illyes explained:

“Yeah, that could be one thing that it can mean. Crawled but not indexed is, ideally, we would break up that category into more granular chunks, but it’s super hard because of how the data internally exists.

It can be a bunch of things, dupe elimination is one of those things, where we crawl the page and then we decide to not index it because there’s already a version of that or an extremely similar version of that content available in our index and it has better signals. But yeah, it can be multiple things.”

Site Quality Impact

Another significant factor Illyes mentioned is the overall quality of the site. He said:

“And the general quality of the site can matter a lot regarding how many of these crawled but not indexed you see in Search Console. If the number of these URLs is very high, that could hint at general quality issues.

I’ve seen that a lot since February, where suddenly we just decided that we are indexing a vast amount of URLs on a site just because our perception of the site has changed.”

Technical Issues

Illyes also pointed out that technical issues could cause URLs to be crawled but not indexed:

“…And one possibility is that when you see that number rising, Google’s perception of the site has changed, that could be one thing. But then there could also be an error, for example, on the site, and then it served the same exact page to every single URL on the site. That could also be one of the reasons that you see that number climbing. So yeah, there could be many things.”

Key Takeaways

Gary Illyes’ insights can help webmasters debug why a webpage might be crawled but not indexed by Google. The primary reasons include:

Content similarity to existing content in the search engine results pages (SERPs)
Existence of exact same content on another site with better signals
General site quality issues
Technical errors

Although Illyes didn’t elaborate on what he meant by another site with better signals, he was likely referring to situations where a site syndicates its content to another site. Google ranks the other site for the content instead of the original publisher.

Understanding these factors can help webmasters address indexing issues more effectively and improve their site’s overall performance in search results.

Originally published at https://www.hireawriter.us.