SEO with the Google Search Console API and Python

The thing I enjoy most about SEO is thinking at scale. Postmates is fun because sometimes its more appropriate to size opportunities on a logarithmic scale than a linear one.

But there is a challenge that comes along with that: opportunities scale logarithmically, but I don’t really scale… at all. That’s where scripting comes in.

SQL, Bash, Javascript, and Python regularly come in handy to identify opportunities and solve problems. This example demonstrates how scripting can solve some of the challenges of having a lot of potentially mutable data.

Scaling SEO with the Google Search Console API

Most, if not all, big ecommerce and marketplace sites are backed by databases. And the bigger these places are, the more likely they are to have multiple stakeholders managing and altering data in the database. From website users to customer support, to engineers, there are a number of ways that database records can change. As a result, the content of the site grows, changes, and sometimes disappears.

It’s very important to know when these changes occur and what effect the changes will have on search engine crawling, indexing and results. Log files can come in handy but the Google Search Console is a pretty reliable source of truth for what Google sees and acknowledges on your site.

Getting Started

This guide will help you start working with the Google Search Console API, specifically with the Crawl Errors report but the script could easily be modified to query Google Search performance data or interact with sitemaps in GSC.

To get started, clone the Github Repository: https://github.com/trevorfox/google-search-console-api and follow the “Getting Started” steps on the README page. If you are unfamiliar with Github, don’t worry. This is an easy project to get you started.

Make sure you have the following:

Now for the fun stuff!

Connecting to the API

This script uses a slightly different method to connect to the API. Instead of using the Client ID and Client Secret directly in the code. The Google API auth flow accesses these variables from the client_secret.json file. This way you don’t have to modify the webmaster.py file at all, as long as the client_secret.json file is in the /config folder.

try:
credentials = pickle.load(open("config/credentials.pickle", "rb"))
except (OSError, IOError) as e:
flow = InstalledAppFlow.from_client_secrets_file('client_secret.json', scopes=OAUTH_SCOPE)
credentials = flow.run_console()
pickle.dump(credentials, open("config/credentials.pickle", "wb"))

webmasters_service = build('webmasters', 'v3', credentials=credentials)

For convenience, the script saves the credentials to the project folder as a pickle file. Storing the credentials this way means you only have to go through the Web authorization flow the first time you run the script. After that, the script will use the stored and “pickled” credentials.

Querying Google Search Console with Python

The auth flow builds the “webmasters_service” object which allows you to make authenticated API calls to the Google Search Console API. This is where Google documentation kinda sucks… I’m glad you came here.

The script’s webmasters_service object has several methods. Each one relates to one of the five ways you can query the API. The methods all correspond to verb methods (italicized below) that indicate how you would like to interact with or query the API.

The script currently uses the “webmaster_service.urlcrawlerrorssamples().list()” method to find how many crawled URLs had given type of error.

gsc_data = webmasters_service.urlcrawlerrorssamples().list(siteUrl=SITE_URL, category=ERROR_CATEGORY, platform='web').execute()

It can then optionally call “webmaster_service.urlcrawlerrorssamples().markAsFixed(…)” to note that the URL error has been acknowledged- removing it from the webmaster reports.

Google Search Console API Methods

There are five ways to interact with the Google Search Console API. Each is listed below as “webmaster_service” because that is the variable name of the object in the script.

webmasters_service.urlcrawlerrorssamples()

This allows you to get details for a single URL and list details for several URLs. You can also programmatically mark URL’s as Fixed with the markAsFixed method. *Note that marking something as fixed only changes the data in Google Search Console. It does not tell Googlebot anything or change crawl behavior.

The resources are represented as follows. As you might imagine, this will help you find the source of broken links and get an understanding of how frequently your site is crawled.

{
"pageUrl": "some/page-path",
"urlDetails": {
"linkedFromUrls": ["https://example.com/some/other-page"],
"containingSitemaps": ["https://example.com/sitemap.xml"]
},
"last_crawled": "2018-03-13T02:19:02.000Z",
"first_detected": "2018-03-09T11:15:15.000Z",
"responseCode": 404
}

webmasters_service.urlcrawlerrorscounts()

If you get this data, you will get back the day-by-day data to recreate the chart in the URL Errors report.

Crawl Errors

webmasters_service.searchanalytics()

This is probably what you are most excited about. This allows you to query your search console data with several filters and page through the response data to get way more data than you can get with a CSV export from Google Search Console. Come to think of it, I should have used this for the demo…

The response looks like this with a “row” object for every record depending on you queried your data. In this case, only “device” was used to query the data so there would be three “rows,” each corresponding to one device.

{
"rows": [
{
"keys": ["device"],
"clicks": double,
"impressions": double,
"ctr": double,
"position": double
},
...
],
"responseAggregationType": "auto"
}

webmasters_service.sites()

Get, list, add and delete sites from your Google Search Console account. This is perhaps really useful if you are a spammer creating hundreds or thousands of sites that you want to be able to monitor in Google Search Console.

webmasters_service.sitemaps()

Get, list, submit and delete sitemaps to Google Search Console. If you want to get into fine-grain detail into understanding indexing with your sitemaps, this is the way to add all of your segmented sitemaps. The response will look like this:

{
"path": "https://example.com/sitemap.xml",
"lastSubmitted": "2018-03-04T12:51:01.049Z",
"isPending": false,
"isSitemapsIndex": true,
"lastDownloaded": "2018-03-20T13:17:28.643Z",
"warnings": "1",
"errors": "0",
"contents": [
{
"type": "web",
"submitted": "62",
"indexed": "59"
}
]
}

Modifying the Python Script

You might want to change the Search Console Query or do something with response data. The query is in webmasters.py and you can change the code to iterate through any query. The check method checker.py is used to “operate” on every response resource. It can do things that are a lot more interesting than printing response codes.

Query all the Things!

I hope this helps you move forward with your API usage, python scripting, and Search Engine Optimization… optimization. Any question? Leave a comment. And don’t forget to tell your friends!