Analysis of Google Political Ads using BigQuery

Ernesto Cuadra Foy
Nov 9 · 6 min read

Hello everyone, this is my first article on Medium. I have been interested in data science and analytics while working on my Masters project. I have tried my hand with different beginner datasets to learn some of the basics of Python, SQL, and other languages. However, I felt that repeating the same exercises got boring after a while, and I started losing interest in the subject.

Then I got a hold of Google Cloud Services and the BigQuery platform. While cloud development is not one of my main interests, BigQuery was able to provide me with what I needed to keep me entertained, which is some more interesting data. Now, while data isn’t the hardest thing to find on the internet, BigQuery also provides a very helpful IDE that can guide you as a beginner who is learning to write more complex SQL queries. I decided to try my hand at one of these datasets BigQuery provides, and that’s where I found the Google Political Ads (GPA) database.

The BigQuery interface provides a helpful IDE for checking SQL queries before running them.

The GPA database contains all sorts of political ads paid for by advertisers that appear on platforms run by Google online. They include not only the United States but several countries in the European Union and some Asian countries as well. Overall there are 1,537 unique advertisers registered to the platform. Of all these advertisers, 75% of them are declared to be operating in the US region. With such a large sample of American political advertisers, I decided to focus only on US data. The data contains information from August 2018 to November 8th, 2019 and is constantly updated.

The overall architecture of the GPA database. A lot of tables can be joined by using the unique advertiser_id attribute.

As you can see from the image above, most of the tables can be traced back to either advertiser_id or advertiser_name. I was not sure if ID or Name were expected to be unique, so I opted to choose ID as the primary key since it looked like the most logical option. The first step was to extract the data from BigQuery into my favourite platform. I chose to run this project on python.

How to get the Bigquery data into python

The next step was to write a SQL query that will extract only the tables I was interested in looking at. Originally I was interested in looking at how much money was being spent over time, so I could have chosen the table weekly_spend only, but this table did not include the column region, thus making it impossible to filter out only US advertisers. Finally, the query was saved into a Pandas DataFrame.

Calling a query

For my first observation, I wanted to look at the top 10 political advertisers putting money in the GPA database.

For the observed period, Trump MAGA committee has spent about $8 million dollars in political ads on Google, making it the highest spender of the bunch. In terms of political affiliation, Senate Leadership Fund, Congress Leadership Funds, NRCC, NRSC, and Donald J. Trump for President (obviously) are all republican-based advertised according to the Federal Election Commission (FEC). In total, 6 out of 10 advertisers are the top spenders in GPAs, accounting for a total of $23 million dollars. Some of the other spenders are self-explanatory. But it was not clear what DedicatedEmails.com was supposed to represent since I could not find them on the FEC website. I looked at a few of their ads individually under the creative_stats table, and then I found that some of them are related to a relatively unknown website called Conservative Buzz. It’s interesting to see how a small website can afford to spend $3 million dollars in political ads during a year. However, it is likely that is not just one website, but many separate ones that operate through DedicatedEmails.com

Dedicatedemails.com ad

Next, I decided to extract the DataFrames from Python and opted to use Tableau in the next part of my project. I also set up a query to obtain geographical information about GPA spending by state.

Unfortunately, Medium does not allow for an easy way to embed Tableau visualizations, but you can access it here.

Timeline of the highest spenders 2018–2019

The top 10 advertisers represent about 34% of the total $113 million dollars spent during the observed period. At first, I wanted to plot the amount spent on a specific date by each advertiser, but the visualization looked pretty bad and confusing so I opted for the total spent overall. Senate Leadership Fund went really hard on the spending around October 2018, possibly related to the Senate Elections that took place on November 6 of that year. A lot of money was put in motion during that time for the midterm elections, as noted by the peaks of NRSC, NRCC, and Congressional Leadership Fund. As we move towards the election in 2020, more advertisers with candidate's names start to appear, including one with the incumbent president. Ironically enough, two advertisers have been somewhat consistent during this period, the MAGA committee and Need to Impeach.

Spending by state related to population (left), Total spending by state (right). To look at the maps more in-detail, click on the hyperlinks below.

Finally, I was interested in looking at how much money was put into each state during this time. This information was found under the geo_spend table. Unfortunately, it does not provide information on the advertisers that spent money on them individually. Two maps were created, one related to money spent in total related to each individual state population and another for the total amount of money put into each state.

At first, when looking at the money put individually into each state (right figure), it makes sense that more money is spent in more highly populated areas. California, Texas and Florida are the three most populated states according to the 2017 US census state estimates. While California is first in population, it came second on spending when compared to Florida, where $12 million dollars were spent on GPAs during the year. I was wondering if the amount spent would also reflect the density for each individual state, which is what the left figure represents. Surprisingly, Montana had a density of 1.5 dollars/inhabitants spent on GPAs, compared to 0.28 and 0.57 for California and Florida, respectively. Looking back at the senate elections of 2018 could explain why.

Senate Elections Results 2018. Darker colours represent gains for a party. Red for Republicans and Blue for Democrats. (Source Wikipedia)

It appears that the states with the highest density were areas where the Republican party was expecting to keep or make some significant gains. They were not able to get Montana and lost Nevada and Arizona in spite of being the areas with the most spending density. However, they were successful in breaking through North Dakota and Missouri.

Digging into this database was an exciting opportunity of looking at how GPAs are being used in the United States. We noticed that of the top 10 spenders most are Republican-affiliated advertisers. As we get closer to the presidential election in 2020, it is very likely that we will see more money put into GPAs and more aggressive targetting will be performed in the upcoming months.

The Startup

Medium's largest active publication, followed by +537K people. Follow to join our community.

Ernesto Cuadra Foy

Written by

HIV researcher specializing in the study of transmission networks. Looking to learn more about Data Science and Analytics.

The Startup

Medium's largest active publication, followed by +537K people. Follow to join our community.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade