Hello everyone, this is my first article on Medium. I have been interested in data science and analytics while working on my Masters project. I have tried my hand with different beginner datasets to learn some of the basics of Python, SQL, and other languages. However, I felt that repeating the same exercises got boring after a while, and I started losing interest in the subject.
Then I got a hold of Google Cloud Services and the BigQuery platform. While cloud development is not one of my main interests, BigQuery was able to provide me with what I needed to keep me entertained, which is some more interesting data. Now, while data isn’t the hardest thing to find on the internet, BigQuery also provides a very helpful IDE that can guide you as a beginner who is learning to write more complex SQL queries. I decided to try my hand at one of these datasets BigQuery provides, and that’s where I found the Google Political Ads (GPA) database.
The GPA database contains all sorts of political ads paid for by advertisers that appear on platforms run by Google online. They include not only the United States but several countries in the European Union and some Asian countries as well. Overall there are 1,537 unique advertisers registered to the platform. Of all these advertisers, 75% of them are declared to be operating in the US region. With such a large sample of American political advertisers, I decided to focus only on US data. The data contains information from August 2018 to November 8th, 2019 and is constantly updated.
As you can see from the image above, most of the tables can be traced back to either advertiser_id or advertiser_name. I was not sure if ID or Name were expected to be unique, so I opted to choose ID as the primary key since it looked like the most logical option. The first step was to extract the data from BigQuery into my favourite platform. I chose to run this project on python.
The next step was to write a SQL query that will extract only the tables I was interested in looking at. Originally I was interested in looking at how much money was being spent over time, so I could have chosen the table weekly_spend only, but this table did not include the column region, thus making it impossible to filter out only US advertisers. Finally, the query was saved into a Pandas DataFrame.
For my first observation, I wanted to look at the top 10 political advertisers putting money in the GPA database.
For the observed period, Trump MAGA committee has spent about $8 million dollars in political ads on Google, making it the highest spender of the bunch. In terms of political affiliation, Senate Leadership Fund, Congress Leadership Funds, NRCC, NRSC, and Donald J. Trump for President (obviously) are all republican-based advertised according to the Federal Election Commission (FEC). In total, 6 out of 10 advertisers are the top spenders in GPAs, accounting for a total of $23 million dollars. Some of the other spenders are self-explanatory. But it was not clear what DedicatedEmails.com was supposed to represent since I could not find them on the FEC website. I looked at a few of their ads individually under the creative_stats table, and then I found that some of them are related to a relatively unknown website called Conservative Buzz. It’s interesting to see how a small website can afford to spend $3 million dollars in political ads during a year. However, it is likely that is not just one website, but many separate ones that operate through DedicatedEmails.com
Next, I decided to extract the DataFrames from Python and opted to use Tableau in the next part of my project. I also set up a query to obtain geographical information about GPA spending by state.
Unfortunately, Medium does not allow for an easy way to embed Tableau visualizations, but you can access it here.
The top 10 advertisers represent about 34% of the total $113 million dollars spent during the observed period. At first, I wanted to plot the amount spent on a specific date by each advertiser, but the visualization looked pretty bad and confusing so I opted for the total spent overall. Senate Leadership Fund went really hard on the spending around October 2018, possibly related to the Senate Elections that took place on November 6 of that year. A lot of money was put in motion during that time for the midterm elections, as noted by the peaks of NRSC, NRCC, and Congressional Leadership Fund. As we move towards the election in 2020, more advertisers with candidate's names start to appear, including one with the incumbent president. Ironically enough, two advertisers have been somewhat consistent during this period, the MAGA committee and Need to Impeach.
Finally, I was interested in looking at how much money was put into each state during this time. This information was found under the geo_spend table. Unfortunately, it does not provide information on the advertisers that spent money on them individually. Two maps were created, one related to money spent in total related to each individual state population and another for the total amount of money put into each state.
At first, when looking at the money put individually into each state (right figure), it makes sense that more money is spent in more highly populated areas. California, Texas and Florida are the three most populated states according to the 2017 US census state estimates. While California is first in population, it came second on spending when compared to Florida, where $12 million dollars were spent on GPAs during the year. I was wondering if the amount spent would also reflect the density for each individual state, which is what the left figure represents. Surprisingly, Montana had a density of 1.5 dollars/inhabitants spent on GPAs, compared to 0.28 and 0.57 for California and Florida, respectively. Looking back at the senate elections of 2018 could explain why.
It appears that the states with the highest density were areas where the Republican party was expecting to keep or make some significant gains. They were not able to get Montana and lost Nevada and Arizona in spite of being the areas with the most spending density. However, they were successful in breaking through North Dakota and Missouri.
Digging into this database was an exciting opportunity of looking at how GPAs are being used in the United States. We noticed that of the top 10 spenders most are Republican-affiliated advertisers. As we get closer to the presidential election in 2020, it is very likely that we will see more money put into GPAs and more aggressive targetting will be performed in the upcoming months.