Advanced Image Indexing Checklist for Google Search
A closer look at the code and technical considerations
--
I’ve been running the app, Image Sitemap for Shopify, for two years now. It works pretty sweet, lots of happy customers — and naturally, a few grumpy ex-customers too! My app currently submits over 1M images per month to Google for indexing on behalf of Shopify customers. This post will focus on Shopify sites who use JQuery, but the concepts are applicable to all sites regardless of client stack.
My three other posts about image indexing and Image Sitemap for Shopify can serve as a good intro to optimizing your site/images for indexing by Google.
- Introducing — Image Sitemap for Shopify
- Lessons from submitting 500k images from Shopify stores to Google for indexing
- Why Isn’t Google Indexing My Images?
And so, you’ve optimized a bunch of stuff and you’re still having some trouble getting your images indexed? Let’s dive a bit deeper on strategies not discussed in my other posts.
1) Get rid of ALL javascript console errors.
This seems like a no-brainer, yet I’m consistently shocked how many console errors I see on our customers’ sites—gross misconduct that would make even a fresh young developer cringe.
Console errors are reported on inside of Google Search Console. As such, I would consider them bad. Just fix them. If you don’t know how to fix them, find someone that can.
If you’re throwing javascript errors in your client code, we know one thing for certain—something unexpected happened in your code. That is bad. Google’s job is to provide the most relevant results to its search customers. If Google see’s that something unexpected is happening in your client code, generating errors, maybe they choose to send visitors somewhere else — to a site with no errors. Google would not know the severity of those errors and Google reports on them—so maybe that’s something to worry about. That’s my logic; maybe I’m crazy, maybe not. But get these errors fixed asap.
2) Be very careful with your lazyloading libraries and code
I see a lot of sites using libraries that support the new lazyloading capability of browsers. Most are using lazysizes. I don’t trust them for image indexing—but I’m jaded—I have seen some sites perform very will with their image indexing using these libraries.
If you use these libraries, please make sure to specify a good sized default image in your code. I see lots of sites default to a very small 300x300 px image in the code. In my view, we would want to give Google the best default src=
we can. Don’t make them work! Also don’t specify many excess options in your data-srcset
, just a few.
Furthermore, if your code does not include src=
in favor of data-src=
for the core image src, I pray for you and your indexing 🙏
<img
src="image_800x800.jpg"
srcset="image_800x800.jpg"
data-srcset="image_400x400.jpg 400w,
image_800x800.jpg 800w,
image_1200x1200.jpg,
image_1600x1600.jpg"
data-sizes="auto"
class="lazyload" />
3) Completely fix your Google Structured Data on your product pages.
I prefer using the JSON-LD spec as opposed to microdata because it separates the structured data from the markup. Microdata makes html markup much more verbose and harder to debug.
There are two levels of optimum compliance here:
- No errors and some warnings
- No errors and no warnings
We want #2, “No errors and no warnings.” I don’t have real data about how this affects your image indexing in Google, but my anecdotal and gut instinct says that getting to compliance level #2 can improve indexing for many sites, particularly if your site does not get a ton of organic and paid traffic.
If you buy a lot of traffic for your site, FIX these! Google reports on structured data errors in a variety of places now, including Google Merchant Center (Shopping) and Search Console. That must mean this data is important!
4) Render simple HTML for googlebot (no JQuery).
Many site are using a staggering amount of lazyloading, javascript fades, show/hides and animations. Google does NOT like this stuff. Google officially supports javascript rendering, but I’ve seen many cases that have a serious negative impact on image indexing. In my own experiments it’s been fun to see improvements when rendering different code special for googlebot. For googlebot, you want the client code to be as dumb and simple as possible; the most rudimentary semantic html you can generate.
This is one of my experimental techniques. I’m not ready to say how effective this strategy is, but with the optimizations in my previous articles and those on this page, I took https://www.gbyultra.com from 0 to 91% images indexed by Google in 90 days. gbyultra.com was a brand new domain with no previous ranking whatsoever.
My strategy for rendering special googlebot markup is pretty simple. I have two divs: one for humans, one for bots. They are both hidden by default and the appropriate div is shown with raw javascript before JQuery finishes loading, i.e. before the $(function(){}) callback. The bot
div is generated with liquid and semantic markup, super simple—not a single bit of extra layout or css if it can be prevented. The human
div is rendered purely with JQuery —and has lots of show/hide, overflow:hidden, scrolling events, etc. — Google hates my JQuery code, for sure.
Here is a sketch of my basic setup:
<div class="human" style="display:none">
<div class="product_human"></div> <!-- RENDERED WITH JQUERY -->
</div><div class="bot" style="display:none">
{% include 'product_bot' %} <!-- ALL LIQUID + HTML -->
</div><script>
function bot_handler () {
var bot_pattern = "Googlebot|Googlebot-Mobile|Googlebot-Image|Google favicon|Mediapartners-Google|bingbot|slurp|java|wget|curl|Commons-HttpClient|Python-urllib|libwww|httpunit|nutch|phpcrawl|msnbot|jyxobot|FAST-WebCrawler|FAST Enterprise Crawler|biglotron|teoma|convera|seekbot|gigablast|exabot|ngbot|ia_archiver|GingerCrawler|webmon |httrack|webcrawler|grub.org|UsineNouvelleCrawler|antibot|netresearchserver|speedy|fluffy|bibnum.bnf|findlink|msrbot|panscient|yacybot|AISearchBot|IOI|ips-agent|tagoobot|MJ12bot|dotbot|woriobot|yanga|buzzbot|mlbot|yandexbot|purebot|Linguee Bot|Voyager|CyberPatrol|voilabot|baiduspider|citeseerxbot|spbot|twengabot|postrank|turnitinbot|scribdbot|page2rss|sitebot|linkdex|Adidxbot|blekkobot|ezooms|dotbot|Mail.RU_Bot|discobot|heritrix|findthatfile|europarchive.org|NerdByNature.Bot|sistrix crawler|ahrefsbot|Aboundex|domaincrawler|wbsearchbot|summify|ccbot|edisterbot|seznambot|ec2linkfinder|gslfbot|aihitbot|intelium_bot|facebookexternalhit|yeti|RetrevoPageAnalyzer|lb-spider|sogou|lssbot|careerbot|wotbox|wocbot|ichiro|DuckDuckBot|lssrocketcrawler|drupact|webcompanycrawler|acoonbot|openindexspider|gnam gnam spider|web-archive-net.com.bot|backlinkcrawler|coccoc|integromedb|content crawler spider|toplistbot|seokicks-robot|it2media-domain-crawler|ip-web-crawler.com|siteexplorer.info|elisabot|proximic|changedetection|blexbot|arabot|WeSEE:Search|niki-bot|CrystalSemanticsBot|rogerbot|360Spider|psbot|InterfaxScanBot|Lipperhey SEO Service|CC Metadata Scaper|g00g1e.net|GrapeshotCrawler|urlappendbot|brainobot|fr-crawler|binlar|SimpleCrawler|Livelapbot|Twitterbot|cXensebot|smtbot|bnf.fr_bot|A6-Indexer|ADmantX|Facebot|Twitterbot|OrangeBot|memorybot|AdvBot|MegaIndex|SemanticScholarBot|ltx71|nerdybot|xovibot|BUbiNG|Qwantify|archive.org_bot|Applebot|TweetmemeBot|crawler4j|findxbot|SemrushBot|yoozBot|lipperhey|y!j-asr|Domain Re-Animator Bot|AddThis";
bot_regex = new RegExp(bot_pattern, 'i') var these_show = function (these) {
for (i = 0; i < these.length; i++){
these[i].style.display = 'block'
}
}
var these_type = 'human'
if ( navigator.userAgent.match(bot_regex) ) {
window.is_bot = true
console.log('googlebot detected')
these_type = 'bot'
}
var counter = 0
var loop_me = function () {
if ( counter++ > 30 ) return
if ( document.getElementsByClassName(these_type).length === 0 ) {
setTimeout(loop_me, 50)
return
}
these_show(document.getElementsByClassName(these_type))
}
loop_me()
} window.is_bot = false
bot_handler()
</script>
*The loop_me
function is necessary to make sure we can find the elements on the page and show them regardless of network speed. With slow mobile connections it can take a long time for the DOM to start rendering, so it’s hard to rely on any standard page load events. I prefer to watch for the elements with a timeout. We might also have a handful of human
or bot
divs, so they could possibly render at different times, so I just keep watching for 30*50ms, which is 1.5 seconds. Remember this is before JQuery loads so we’re getting into the DOM very early.*
In closing, I encourage you to explore the ideas above and let me know if you have success. Image indexing is a pretty tough sport. It takes a lot of patience and persistence, just like all SEO disciplines. Stick to it and find what works for your sites.
Check out Image Sitemap for Shopify today. We offer a 21-day free trial, with plans starting at $4/month.
Find me at WilliamBelk.com. Follow me on Twitter.