Compare more than 5 keywords in Google Trends Search using pytrends

Akanksha
Analytics Vidhya
Published in
5 min readMay 27, 2020

Do you want to compare more than 5 keywords in pytrends but Google Trends Search’s limit is forbidding you to do so? Well, you have come to the right place. I took inspiration from this article and tried to implement the same operations using pytrends and matplotlib for automation purpose. First, go through this article to get your basics right and then we will jump to python implementation using pytrends and matplotlib-> linkToArticle

Consider a scenario where I want to compare trends of the following TV Series:

Stranger Things,Friends, Riverdale,Game of Thrones, The office, Breaking Bad ,Money Heist,Peaky Blinders and Black Mirror

Since Google Trends Search allows comparison of a maximum of 5 keywords. Let’s divide this list into two halves such that both halves contain a common TV Series (a common keyword). This common keyword is our trump card to play the trick and compare all the keywords together. The following image shows how I divided the list into two halves with “The office” as a common TV series (keyword)among the two. (Any TV Series can be picked up randomly as a common TV Series, I picked up “The office”)

Dividing list of TV Series into two separate lists with a common TV Series “The office”
Trend Graph for List 1
Trend Graph for List 2

How to combine these two results? As I mentioned earlier, the common keyword in both lists will help us combine the results. We need to manipulate the data first as trend graphs for both the lists is scaled differently.

Let’s get started with implementation and leverage pytrends to achieve the desired result.

#Import TrendReq to connect to Google
from pytrends.request import TrendReq

Create two separate lists out of the list of TV Series as mentioned above. I prefer saving common keyword at index 0 for easy retrieval at the time of data manipulation.

list1=['The office','Stranger Things','Friends', 'Riverdale','Game of Thrones']
list2=['The office', 'Breaking Bad' ,'Money Heist','Peaky Blinders ,'Black Mirror']

Let’s use TrendReq method to connect to Google

pytrends1=TrendReq()
pytrends2=TrendReq()

Now we need to build payload for both the lists and pass parameters to filter the data for trend comparison based on our requirements. Here, I am considering the US as target Geolocation and timeframe of 1 year hence 12-m (yeah, m is for months)

pytrends1.build_payload(list1,geo='US',timeframe="today 12-m")
pytrends2.build_payload(list2,geo='US',timeframe="today 12-m")

interest_over_time method returns pandas.Dataframe objects containing trend values for each TV series in the list for the given timespan (for 1 year, in this case)

df1=pytrends1.interest_over_time()
df2=pytrends2.interest_over_time()
Top 5 rows in both the data frames: df1 and df2

As you notice the values, these lie in the range 0 to 100. These values denote the relative search volume or Google Trends Index where 100 represents the highest search volume based on popularity over a timespan and 50 represent that a particular keyword is half as popular over a timespan.

You must have noticed the values in the column of our common TV Series (The office). If you haven’t noticed then please pay closer attention to the column “The office” in both the data frames df1 and df2. The values are not the same. Strange, right? This is because both the lists are scaled differently. Now we need to normalize one list based on the second one. Let’s normalise list 2 using list 1 as reference.

We need to calculate the average for each column in both the data frames in. Let’s do that first.

averageList1=[]
averageList2=[]
for item in list1:
averageList1.append(df1[item].mean().round(0))
for item in list2:
averageList2.append(df2[item].mean().round(0))
list of average for list1 and list2 respectively

Now that we have an average for each column for both the lists, let’s calculate the normalization factor in order to equalize list 2.

Normalization factor= (average of common TV Series in list 1)/average of common TV Series in list 2)

Since “The office” (common TV Series in both lists) is stored at index 0 in list1 as well as list2, hence I need to retrieve the value at index 0 from both the lists.

averageList1[0] signifies an average of “The office” in list1 and averageList2[0] signifies an average of “The office” in list2.

normalizationFactor=averageList1[0]/averageList2[0]

To equalize the averages in list 2, normalizationFactor needs to be multiplied with every entry in list 2

for i in range(len(averageList2)):
normalisedVal=normalizationFactor*averageList2[i]
averageList2[i]=normalisedVal.round(0)
List of averages for list1 and list2 respectively (after normalization )

It can be observed that average for “The office” is now the same for both the lists. It simply means that averages in list2 have been scaled to the same scale as list 1. Now we can use this data to plot graph and 9 compare TV Series in the same plot area.

I will eliminate the “The office” (common TV Series) from list2 now as I don’t want it to be plotted two times in the graph. It makes sense to remove the average of “The office” from averageList2 since we are removing “The office” from list of series in List2.

averageList2.pop(0)
list2.pop(0)

Now that the repeating entry for common TV Series“The office” has been removed from list2, both the lists can be combined for further comparison.

Combine the list of TVSeries i.e. list 1 and list 2

TVSeriesList=list1+list2

Combine the list of averages for both list1 and list2 i.e. combine averageList1 and averageList2

finalAverageList=averageList1+averageList2
Snippet of TVSeriesList and FinalAverageList

Here comes the fun part of plotting the trend graphs for all listed TVSeries.

import numpy as np
import matplotlib.pyplot as plt
y_pos=np.arange(len(TVSeriesList))
plt.barh(y_pos,finalAverageList,align='center',alpha=0.5)
plt.yticks(y_pos,TVSeriesList)
plt.xlabel('Average popularity')
plt.show()
Trend comparison for all the listed TVSeries

I hope this article was helpful in getting an understanding of comparing a large set of keywords using pytrends. Feel free to drop in your approach and suggestions. Let’s help each other get a better understanding of simple yet important concepts.

Update —

You can compare more than 2 lists using this approach. I’ll implement and update it here soon but for now, here is the thought process. :)

Compare multiple keywords using this approach

--

--

Akanksha
Analytics Vidhya

Analysing the drama hidden underneath the layers of data!! You guessed it right — Data analyst.