How to Force Google to Index Your Website and Ways to Fix Common Crawling Issues
This article covers the initial steps you need to understand for how to get your website on Google. I’ll cover a quick overview of what Google looks for when indexing websites on its search results pages and a step-by-step tutorial for getting your site listed on Google search. This article is intended for those with little understanding of Search Engine Optimization (SEO) but can serve as a checklist of sorts for the experienced as well. If you need help, get in touch with me to discuss advanced analysis of your search rankings.
How Google indexes websites.
Search engines provide a result for a search query. Google wants to provide the best, most accurate results as high as possible on the search results page. To do this they crawl millions of webpages, catalog information, rank it and place what they feel are the best results on the search engine results page in order of relevance to the user. There are dozens of factors that play into how well your website ranks for a given search query. My goal isn’t to explain all the tactics to rank on Google, but to give you a better understanding of basic best practices for making your website show up on Google search.
How to use Google Webmaster Tools to index your website with Google.
Webmaster Tools is critical for the health of your website on Google’s search properties. It’s free and takes very little time to setup. You’ll get everything from alerts about malware and hacks to the extremely useful ability to force Google to crawl your website on command.
The very first thing you need to do is register for a Google Webmaster Tools account. Google Webmaster Tools is a powerful search engine optimization tool that helps link your website to Google’s search engine. It also provides data from scrapes that Google’s search bots perform on your website. This gives you an understanding of how healthy your website looks to Google’s bots and gives you suggestions on things to improve. To add a site you’ll click “Add Site” and enter your domain. Please note, www.yoursite.com is different from http://www.yoursite.com. Follow the on screen prompts to sync and verify your Google account with Google Webmaster Tools. Verifying your website will require an active Google Analytics account or server access. Accessing your server is covered in the next step.
Common issues that prevent Google from indexing your website.
To get the most out of this guide you’ll need to have server access so you can edit files on the back-end of your website. Speak with your website administrator or hosting company. If you use a self-hosting company like Siteground, you’ll have access to a C-Panel for editing this sort of information. I prefer using a client like Filezilla for editing server files.
FTP quick overview
FTP access to your website is important for transferring files to and from your server. This lets you check and edit your robots.txt file to ensure it’s not setup to block search bots from crawling your website.
Using an FTP client is like navigating an operating system’s file explorer. There are folders and files. However, be careful when editing files on your server as you can do a lot of damage to a website through the server. Make sure to copy all files before editing so you have a quick backup available to replace any unintentional changes. If you’re unsure of how to edit a web server through the FTP then get help from your website administrator.
Ensure Robots.txt file isn’t blocking Google search bots.
Search engine robots need to be able to crawl your website easily. The robots.txt file is a powerful file that can block your entire website from search bots with just a simple line of text called “disallow: /”. You can see here I took a screenshot of my robots.txt file and it’s blocking Wordpress core files (/wp-admin/ and /wp-includes/ directories). This isn’t recommended anymore, so don’t take this advice. But you can at least see what a robots.txt file looks like.
The Robots.txt file may seem like nothing more than a notepad file, but it’s incredibly powerful. This file lives on your server in the main directory of your website and controls search engine crawling behavior on your site. With a simple line of text you can block your entire website from being indexed on Google. If you’re not seeing your business on Google’s search engine and you just performed a website launch or re-design, this file could be blocking Google’s search engine crawl bots. When website developers build websites they often place a line of code here to block search engine crawl bot access. This is to prevent any visitors from stumbling onto a partially functional website. This line of code is then edited before launching the website but can be overlooked. If it’s not edited, search engine bots will not crawl your website. This is a very, very common mistake.
Have relevant SEO optimized meta title, meta description and headings.
The back end HTML code of a website holds some serious search engine goodies. A search robot is just code after all so you want to make sure your HTML elements are doing you favors.
Your website and webpages have a framework built using a code called HTML. This code has several tags that highlight information as being important. Google’s search engine scans these tags and logs this information. Two very important meta tags are your meta title and meta description tags. These two tags control what shows up in Google search engine results pages and help identify your page’s general topic. They also act as “advertisements” of sorts meaning the effectiveness of the copy will increase the likelihood of someone clicking your webpage. Your heading tags also send signals to Google in terms of the content of your page. You want to make sure your page has one H1 tag for your on-page title and it doesn’t hurt to have an H2 subheader including important keywords either.
Have natural, keywords in your webpage URL.
Your webpage URL is a critical part of how Google’s search engine ranks your website. When you build a webpage make sure you have a natural sounding URL that is descriptive. It should loosely summarize your page content. Here’s an example: Good: mysite.com/services/small-business-accounting Bad: mysite.com/small-medium-enterprise-business-company-accounting-financing-services-company
A user-friendly and Google robot friendly page hierarchy.
This website navigation shows the site has services with a dropdown highlighting the most important ones on the website. This is a strong indicator to Google that these pages are important and should carry more weight than lesser pages on the site.
How your pages are organized on your website matters. Google’s search engine bots need to easily crawl your website repeatedly and relay information that makes sense. Categorize your website and create paths of information. For example, place multiple services under a “Services” tab on your navigational menu. If you have several business locations, place them all under a “Locations” tab on your menu. Good housekeeping goes a long way ensure visitors and Google’s search engine bot find what they’re looking for.
Logical internal linking between pages on your website and off-site.
Just like a person visiting your site, Google’s search engine bot needs to click a link to visit a page. Make sure your webpages link to relevant pages. Every webpage you want ranked should have a link going to it. The next step helps with this process, but try to have at least one link in your main page copy. Don’t go overboard with linking — too many links can be a spam signal and hurt your search results page rankings.
Have an HTML sitemap in the footer of your website.
A good way to ensure all your pages have at least one link going to them is to use an HTML Sitemap. An HTML sitemap is a webpage that provides a link to every page and post on your website. This not only helps people find things they need but it helps Google’s search engine bot easily reach every page on your site. A great use for an HTML sitemap is embedding it on your 404 error page. That way when a visitor reaches a dead end on your site, they’re given a full map of places to go. You’ll also want to have a generic privacy page and terms of service pages as those have been shown to be trust signals.
Submit an XML sitemap to Google.
An XML sitemap is similar to an HTML sitemap but it’s submitted directly to Google through Google Webmaster Tools. It’s very important to do this as it’s a way of giving Google’s search engine bots a map of your website. It may seem repetitive but there’s no harm in making sure search engines have more than enough information to find all your webpages.
Have proper schema.org markup and validate with Google’s data highlighter.
Google uses a web standard set by Schema.org to help show relevant content to people in different ways. An example of this is the rich snippet of a news article that shows on Google News. If you’re a local business I highly recommend using the data highlighter tool in Webmaster Tools to highlight your business’s contact page information. Combined with Google Local, this can be an important way of getting your website visited by people searching for related terms near your business location.
How to get Google to index your site with one-click.
In Google Webmaster Tools there’s a feature called Google Fetch that lets you manually tell Google’s search engine bot to crawl your website or a specific page. This is a great way to get content ranked quickly on Google. Select your site, click “Crawl” on the left side and select “Fetch As Google.” Enter your page URL you’d like to crawl (excluding the domain information that’s listed) and hit “Fetch.” I make it a habit to manually fetch all new pages I create on my websites.
Whenever I edit a URL, place a redirect or do any kind of significant change to the page structure I re-fetch the page through Google Fetch. It can take a few minutes or days to see the changes in Google. To see if your page has been crawled by Google, type in “site:yourwebsite.com” in Google’s search bar. Exclude the quotations and use your domain without the www. or http://. This shows all pages crawled by Google with your domain and is a great way to identify missing pages from Google’s search index. If you have comments, questions or need help at all feel free to reach out to me.