Automating PageSpeed Tests with Python & Google PageSpeed Insights API

Benjamin Burkholder
Feb 2, 2019 · 8 min read

This article is the second in a series of tutorials on using Python to connect with Google APIs for SEO purposes. I will provide you with a Python script I’ve developed that can automate pagespeed bulk checks, by leveraging Google’s PageSpeed Insights API.

The purpose of this article is to provide you with access to this script, as well as provide steps in order to download and run locally on your machine.

This script allows the user to:

  • Run any desired URLs against the Google PageSpeed Insights API in bulk.
  • Filter the JSON results to only include the most important pagespeed metrics.
  • Save the results to a local csv file for further analysis.

What is the Google PageSpeed Insights API?

Google’s PageSpeed Insights API allows the user to directly query the PageSpeed Insights database, with results being returned in JSON (JavaScript Object Notation) format. In essence, the same data you will see running the tests in a browser are presented in JSON for web transfer purposes.

There are a variety of different URL parameters you can append, this script only returns:

  • Mobile test data.
  • Timestamp of request
  • URL being called.
  • Two chief metrics: First Contentful Paint (seconds), Time to Interactive (seconds)

Why only this data?

As mentioned earlier, the metrics from the API call are all transferred via JSON. As a result, the sheer amount of data returned without filtering is extremely unwieldy (1000+ lines).

In order to successfully target certain branches of a JSON object, you need to specify the entire path to the target object. It can be very time consuming to trace down the path to multiple metrics. As a result, for now I’ve focused on only extracting the three metrics important for my purposes.

API Request Limits

As is the case with most APIs, there is a limit for how many request can be made. As of 2019, the default limit is:

  • 25,000 requests / day.
  • 1,000 requests per 100 seconds.
  • 60 requests per 100 seconds.

This should be more than enough requests for running tactical bulk checks. I would just advise to keep these limits in mind if you start scaling and running multiple full site checks per day.

How is this API Useful for SEO?

Google PageSpeed Insights is a pretty standard tool for most SEOs these days, providing a plethora of data on how a webpage renders and potential performance blockers. However, while this tool’s browser version is very useful for limited in-depth analysis, it’s just not built for bulk checks.

This is where the Google PageSpeed Insights API comes into play.

With this script, you can simply load up the URLs you want to fetch pagespeed insights data for and run it. The script will run quietly in the background while you tackle other work, then you can analyze the data once it completes.

In order to get the list of seed URLs, I typically just run a Screaming Frog crawl and then paste them into the .txt file referenced by the script.

I may add a custom crawler (via Scrapy) to pull the URLs and then add to the file without running Screaming Frog, but that will be a later release.

Getting Started

Now into the nuts and bolts, in this section I’ll cover what prerequisites you’ll need in order to use the script. I have simply pulled these steps from my previous tutorial on leveraging the Knowledge Graph API, since the steps are basically the same.

Here is the basic breakdown:

  • Install latest Python release (3.7.2 currently)
  • Install git.
  • Download Visual Studio Code (or a suitable equivalent to run the code).
  • Clone the script from my Github repository to your machine.
  • Install the Python ‘requests’ module.
  • (Optional) Get Google Cloud API key.

I will not be going into minute detail on all aspects of peripheral configuration. If you hit a snag, a simple Google search should help you as most of these issues are heavily documented. Or ask a developer for assistance if you have access to one.

Installing Python

In order to install the latest Python library, simply navigate to Python.org and look under “Downloads”. There should be releases available for all major operating systems.

In this example, we’ll download for Windows and select one with an executable installer. Simply click the option you want and follow the installation prompts until it indicates Python has been installed onto your machine.

Installing Git

We’ll be using git in order to communicate with GitHub, so we need to first install git. You can find the directions on installing git on different OS via their official website.

If you aren’t very familiar with installing packages directly from the command line, they also provide downloadable versions for ease of use.

Downloading Visual Studio Code (or equivalent)

Next we need a platform in order to run the script in, I typically work in Visual Studio Code since it’s free and very robust. However, if you have an equivalent you prefer feel free to use that. All you need is platform in which to open and run the script.

Clone the Repository from GitHub

Now we need to clone the repository from my GitHub that contains the script. For this, we’ll be using git (which we installed earlier) as well as using the command terminal.

Here are the steps:

  1. On your machine, in the start search bar look for the command terminal native to your OS. On Windows it’s PowerShell and on Mac typically it’s Linux. A simple Google search should help you determine which one you have.
  2. Navigate to my GitHub repository containing the script. On the far right you’ll see a green button that says “Clone or download”, we’re going to be cloning. Now click the little clipboard image circled below, this will copy the path.

3. Open the command line terminal native to your machine, we’re going to clone the project to the desktop for ease of access. You can accomplish this by entering :

cd desktop

Then simply hit enter and you should see the desktop folder appended to the path.

This is how you navigate in and out of folders via command line. For our purposes however, this is as far as we need to go.

Next we will clone the Github project to our desktop. Simply enter this command into the terminal and hit enter:

git clone https://github.com/ibebeebz/google-pagespeed-api-script.git

You should see some action taking place and a success message in your terminal. It worked! Now if you look at your machine’s desktop you should see a folder containing the Python file called ‘google-pagespeed-api-script’.

Installing the ‘Requests’ Module

This is a quick one, in order for the Python script to run, all dependencies need to be installed. This Python script leverages the requests module which doesn’t come with the standard library, thus it needs to be installed separately.

In your command terminal enter this command and hit enter:

pip install requests

You should see activity and a success message, now the requests dependency should be downloaded to your machine.

Getting Google Cloud API Key (Optional)

This step is optional since you only need an API key for the Google PageSpeed Insights API if you’re making requests faster than every two seconds. Since this script runs the list sequentially, you shouldn’t hit this threshold.

Regardless, if you for some reason encounter a threshold, here is a link to setting up a Google Cloud account, as well as how to append the API key to the request URLs.

**NOTE** Make sure to keep your API key safe, never publish it anywhere publicly where it can be exploited.

Using the Python PageSpeed Insights API Script

Now that setup is complete, we can move on to actually using the script.

Follow these steps:

  1. Open Visual Studio Code / PyCharm (Community Edition)

2. In the top left click File > Open File…

3. Navigate to the cloned folder and select the pagespeed python script.

4. You should now see the file open in your Visual Code.

5. Next you need to fill the “pagespeed.txt” file with the URLs you want to test against the API. Make sure you include the protocol in the address. This TXT file will be present in the same cloned folder as the python file.

6. Run the script.

7. The results will be printed out to a newly created CSV called “pagespeed-results.csv”, which is created by the script automatically.

Next, a few steps before running the script.

Running the Script

Once all of the steps above have been completed, you first need to add the URLs you want to test.

1. Open ‘pagespeed.txt’ and paste in the URLs, make sure there’s only one URL per line. The script reads line by line, so any other format will break it.

2. Once the URLs have been loaded, you’re ready to run the script. Open the file into Visual Code and right-click anywhere on the interface. Click on “Run Python File in terminal”.

This will start the script and you should see activity in the terminal in the bottom of the screen. It will probably take 7–8 seconds per URL, so keep this in mind if you don’t see any results immediately. This is where you will see printed output for the script. It’s not necessary to show the output in the terminal (as the results are being saved locally) but it’s useful for monitoring the script’s progress.

How does the script work?

1. The script reads each URL from the .txt file.

2. The URL is appended to the API call and requested.

3. JSON results are returned and filtered to only include the three metrics mentioned earlier.

4. If a URL in the list throws an error (typically it’ll be because it’s a 404), the program will catch the error and display:

<KeyError> & <NameError> Failing because of nonexistent Key — {url}

This is done so the program won’t fail and will show you the isolated URLs that failed, you can then go back and look at them again in the saved csv.

5. Chief metrics are saved to the local .csv file for further analysis.

Conclusion

Hopefully you were able to follow this setup successfully and are able to find use of this script for your own pagespeed analyses.

If you have any questions feel free to reach out, I’ll be making updates to this script to add new features, as well as ensure everything is running smoothly. In case you don’t use the script for awhile and then come back, you can simply clone a new copy of the repository to your desktop to ensure you have the latest version.

Benjamin Burkholder

Written by

Digital Integration Specialist | Python Developer in Cleveland, OH.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade