How to create your own actionable SEO dashboard with Python

Updated version, November 2021

Most important changes are:

  • Using Selenium for automation (download latest CSV with all rankings per competitor)
  • Usage of cronjobs for daily automation (run script, adjust, and send it to Google Drive)
  • PyDrive and Google Colab (useful for sharing with stakeholders)
  • Get rid of pivot table with rankings per keyword (main focus here: category performance instead of keyword performance)
  • Script optimization

With these changes it costs 0 seconds per day to always monitor SEO performance.

For most SEO specialists it is mandatory to keep track on search visibility and keyword rankings in search engines. There are a lot of tools who provides this, such as AWR Cloud, STAT, Ahrefs, SEMrush and many more. This data can be overwhelming when you have a lot of keywords to track and want to be able to compare current rankings with rankings in the past for you and your main competitors.

Also, this amount of information (together with server speed) can impact the fun of analyzing when the information you want is not instantly loaded.

Python and Selenium can be a good way to analyze a lot of data, make some custom calculations on this and plot it just the way you want!

In this article I will write a step-by-step process for creating your own dashboard using Python’s Pandas and (in my case) SEMrush exports. As said, there are a lot of keyword tracking tools, so any other export is fine as well but needs some customization tweaks.

Global idea and working of the dashboard

The dashboard is set up in 3 main steps:

  1. Collect and clean the exports from SEMrush (or any other ranking export file) with Selenium, create a useful work file to continue and send this to Google Drive
  2. Create a cronjob for daily ranktracking
  3. Add a visibility calculation to the data and plot this over categories

The result will be good understanding of ranking and visibility fluctuations, find issues per category and even per keyword group. This can give you a very specific idea where to optimize.

Let’s get started

For my company (www.bcc.nl) we track 1705 keywords at SEMrush over 88 categories (tags). For this article I have collected exports from SEMrush on a weekly base for mobile rankings. I also track the rankings of the 4 main industry competitors.

1: Collect and clean the exports from SEMrush (or any other ranking export file) with Selenium, create a useful work file to continue and send this to Google Drive

Now that we have some background information, create a new Python (3) file and install libraries and dependencies, assuming Python 3 is installed on your machine:

  • Selenium (pip install selenium)
  • Datetime (pip install datetime)
  • Numpy (pip install numpy)
  • PyDrive (pip install pydrive)
  • Pandas (pip install pandas)

Because we need a browser to download latest CSV from SEMrush, we need to temporary download this as well in the script.

Notice that we use sleep(X) in the script to give the browser a bit of time to do his job and don’t rush things.

#!/usr/bin/env python# coding: utf-8from selenium import webdriverfrom selenium.webdriver.common.keys import Keysfrom datetime import date, timedeltafrom time import sleepimport osimport numpy as npfrom pydrive.auth import GoogleAuthfrom pydrive.drive import GoogleDriveimport pandas as pdfrom webdriver_manager.chrome import ChromeDriverManagerdriver = webdriver.Chrome(ChromeDriverManager().install())driver.get('https://www.semrush.com/login/')sleep(1)username = driver.find_element_by_id('email')password = driver.find_element_by_id('password')username.send_keys('USERNAME_SEMRUSH')password.send_keys('MyUlt1m@t3N3v3rT0H@ckPa$$$$$W0rd')password.send_keys(Keys.ENTER)sleep(2)# DATE FORMAT STUFF:_date = date.today() - timedelta(days=1)yesterday = _date.strftime('%Y%m%d')today = date.today().strftime('%Y%m%d')project_id = 'PUT_HERE_YOUR_PROJECT_ID'# THE PLACE WITH YESTERDAYS RANKINGSstarting_page = driver.get(f'https://www.semrush.com/tracking/overview/{project_id}.html?sort=0_pos_asc&date_begin={yesterday}&date_end={yesterday}&domain_1=bcc.nl&domain_2=expert.nl&domain_3=coolblue.nl&domain_4=mediamarkt.nl&domain_5=bol.com&page=1')sleep(2)# DOWNLOAD CSVdriver.find_element_by_xpath('//*[@id="ptr-header"]/div[1]/div[2]/div[2]/div[3]/button').click()sleep(1)driver.find_element_by_xpath('/html/body/div[2]/main/div/div/div[2]/div[1]/div[2]/div[2]/div[6]/div/div/div/div/div/div[5]').click()sleep(5)driver.close()driver.quit()# RENAME FILE TO MORE READABLE AND WORKABLE NAMEdownload_location = 'PUT_HERE_YOUR_DOWNLOAD_LOCATION_ON_YOUR_LOCAL_MACHINE'old_file = f'{download_location}/{project_id}_position_tracking_rankings_overview_{today}.csv'new_file = f'{download_location}/_{today}.csv'os.rename(old_file, new_file)

Now it is time to ask Pandas for help and make some fine tweaks to the data. Notice the ‘header=5’ in the script below. This looks weird, and I think it is weird. But that is how SEMrush creates the exports. Perfectly fine. But weird 😊.

today = date.today().strftime('%Y%m%d')_date = date.today() - timedelta(days=1)yesterday = _date.strftime('%Y%m%d')new_file = f'{download_location}_{today}.csv'df = pd.read_csv(new_file,header=5)df.replace(r'\s+',np.nan,regex=True).replace('',np.nan, inplace=True)df['Datum'] = _date.strftime('%Y-%m-%d')pd.options.mode.chained_assignment = Nonedf['Datum'] = pd.to_datetime(df['Datum'], format='%Y%m%d', infer_datetime_format=True)_yesterday = date.today() - timedelta(days=1)_yesterday = _yesterday.strftime('%Y%m%d')df.rename(columns={'tags':'Tags',f'*.expert.nl/*_{_yesterday}': 'Rankings_Expert', f'*.expert.nl/*_{_yesterday}_landing': 'Expert_url',f'*.bcc.nl/*_{_yesterday}': 'Rankings_BCC', f'*.bcc.nl/*_{_yesterday}_landing': 'BCC_url',f'*.coolblue.nl/*_{_yesterday}': 'Rankings_Coolblue', f'*.coolblue.nl/*_{_yesterday}_landing': 'Coolblue_url',f'*.mediamarkt.nl/*_{_yesterday}': 'Rankings_Mediamarkt', f'*.mediamarkt.nl/*_{_yesterday}_landing': 'Mediamarkt_url',f'*.bol.com/*_{_yesterday}': 'Rankings_Bol.com', f'*.bol.com/*_{_yesterday}_landing': 'Bol.com_url','Search Volume': 'Search_Volume'}, inplace=True)columns_to_use = df[['Datum','Keyword','Tags','Rankings_Expert','Expert_url','Rankings_BCC','BCC_url','Rankings_Coolblue','Coolblue_url','Rankings_Mediamarkt','Mediamarkt_url','Rankings_Bol.com','Bol.com_url','CPC','Search_Volume']]columns_to_use.to_csv(f'{_date}.csv')file = pd.read_csv(f'{_date}.csv')

Next, we can upload our file to Google Drive location. I have created a map in Google Drive in a SEO reporting folder (‘daily_rankings’). Daily files will be uploaded automatically to Drive and cleaned up afterwards. Google Drive library is actually pretty easy to configure. For more information check this article or use Google.

gauth = GoogleAuth()gauth.LoadCredentialsFile('mycreds.txt')if gauth.credentials is None:    gauth.GetFlow()    gauth.flow.params.update({'access_type': 'offline'})    gauth.flow.params.update({'approval_prompt': 'force'})    gauth.LocalWebserverAuth()elif gauth.access_token_expired:    gauth.Refresh()else:    gauth.Authorize()gauth.SaveCredentialsFile('mycreds.txt')drive = GoogleDrive(gauth)gfile = drive.CreateFile({"parents": [{"kind": "drive#fileLink", "id": 'GOOGLE_DRIVE_FOLDER_ID'}]})# Read file and set it as the content of this instance.gfile.SetContentFile(f'{_date}.csv')gfile['title'] = f'{today}.csv'gfile.Upload()# WAIT 5 SECONDS TO UPLOAD TO DRIVE AND REMOVE LOCAL FILEsleep(5)os.remove(f'{_date}.csv')os.remove(new_file)

See that every day (when local machine is active) files will be uploaded in Google Drive.

Daily SEO rankings from SEMrush to Google Drive

2: Create cronjob for daily ranktracking

We now have a full functional script which GET data from SEMrush, make adjustments in downloaded CSV, upload it to Google Drive and keep your local machine clean.

To avoid daily actions, we can give our machine instructions to perform the script every single day (when your machine is active / on). This can be easily managed with cronjobs. If you are not familiar with cronjob I can recommend this video which explains it quite nice and clear.

To create cronjob on your local machine open terminal or command line and type:

$ crontab -e

Now you can create the cronjob. In this case I am running the script each day at 11:20. We need to specify Python 3 location and script location like so:

20 11 * * * /Users/USERNAME/opt/anaconda3/bin/python /Users/USERNAME/docs/python/seo/semrush_rankings.py >> ~/cron.log2>&1

When done type ‘:wq’, which stands for write and quit to quite the editing mode and save your cronjob. To see all cronjobs on your machine type

#categorize all keywords from txt filetags = pd.read_csv('./workfile/tags.txt', sep=';')#merge tags on keywords, check and export to new .csv (which will be a work file)output = workfile.merge(tags, on='Tags')output.head()output.to_csv('./workfile/output.csv', index=False)

3: Add a visibility calculation to the data and plot this over categories

Finally, it is time to see the data in action. Go to https://colab.research.google.com/ to create a new file.

In the first cell, gather all the libraries.

import pandas as pdimport numpy as npimport osimport pandas as pdimport numpy as npimport reimport matplotlib as mplimport matplotlib.pyplot as plt%matplotlib inlinefrom ipywidgets import interactive, widgets, interactimport ipywidgets as widgetsimport warningswarnings.simplefilter('ignore')from google.colab import drivedrive.mount('/content/drive')

Grant access to Google Drive by following the link and copy/paste the code. Next, we combine all daily_rankings into one big DataFrame.

files = [file for file in os.listdir('/content/drive/MyDrive/Colab Notebooks/SEO_reporting/daily_rankings/')]workfile = pd.DataFrame()for file in files:    df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/SEO_reporting/daily_rankings/'+ file, engine='python')workfile = pd.concat([workfile, df])workfile.dropna(subset= ["Keyword"], inplace=True)workfile.replace('-',np.nan, inplace=True)workfile['Rankings_Expert'] = workfile['Rankings_Expert'].astype(float)workfile['Rankings_BCC'] = workfile['Rankings_BCC'].astype(float)workfile['Rankings_Bol.com'] = workfile['Rankings_Bol.com'].astype(float)workfile['Rankings_Mediamarkt'] = workfile['Rankings_Mediamarkt'].astype(float)workfile['Rankings_Coolblue'] = workfile['Rankings_Coolblue'].astype(float)workfile.drop('Unnamed: 0', axis = 1, inplace=True)workfile.replace(r'\s+',np.nan,regex=True).replace('',np.nan, inplace=True)print(workfile.info())

Now a new value should be added. Visibility! In this case we assume that a number 1 position in Google as a 100% visibility all the way down to a top 30 position where there is a 0% visibility. This is saved in map in Google Drive, named VISIBILITY_COMPETITORS.

Rankings_{COMPETITOR};Visibility
1;100,00%
2;96,67%
3;93,33%
4;90,00%
5;86,67%
6;83,33%
7;80,00%
8;76,67%
9;73,33%
10;70,00%
11;66,67%
12;63,33%
13;60,00%
14;56,67%
15;53,33%
16;50,00%
17;46,67%
18;43,33%
19;40,00%
20;36,67%
21;33,33%
22;30,00%
23;26,67%
24;23,33%
25;20,00%
26;16,67%
27;13,33%
28;10,00%
29;6,67%
30;3,33%
31;0,00%

This might look a bit hacky, but it works:

df_vis_expert = pd.read_excel('/content/drive/MyDrive/Colab Notebooks/SEO_reporting/visibility_competitors/vis_expert.csv')df_vis_calc = workfile.merge(df_vis_expert,on='Rankings_Expert', how='outer')df_vis_calc['Visibility_Expert'] = df_vis_calc.Visibility_Expert.astype(float)df_vis_bcc = pd.read_excel('/content/drive/MyDrive/Colab Notebooks/SEO_reporting/visibility_competitors/vis_bcc.csv')df_vis_calc2 = df_vis_calc.merge(df_vis_bcc,on='Rankings_BCC', how='outer')df_vis_calc2['Visibility_BCC'] = df_vis_calc2.Visibility_BCC.astype(float)df_vis_coolblue = pd.read_excel('/content/drive/MyDrive/Colab Notebooks/SEO_reporting/visibility_competitors/vis_coolblue.csv')df_vis_calc3 = df_vis_calc2.merge(df_vis_coolblue,on='Rankings_Coolblue', how='outer')df_vis_calc3['Visibility_Coolblue'] = df_vis_calc3.Visibility_Coolblue.astype(float)df_vis_mediamarkt = pd.read_excel('/content/drive/MyDrive/Colab Notebooks/SEO_reporting/visibility_competitors/vis_mediamarkt.csv')df_vis_calc4 = df_vis_calc3.merge(df_vis_mediamarkt,on='Rankings_Mediamarkt', how='outer')df_vis_calc4['Visibility_Mediamarkt'] = df_vis_calc4.Visibility_Mediamarkt.astype(float)df_vis_bol = pd.read_excel('/content/drive/MyDrive/Colab Notebooks/SEO_reporting/visibility_competitors/vis_bol.csv')df_vis_calc5 = df_vis_calc4.merge(df_vis_bol,on='Rankings_Bol.com', how='outer')df_vis_calc5['Visibility_Bol.com'] = df_vis_calc5['Visibility_Bol.com'].astype(float)

Create a new dataset with Pandas DataFrame.

dataset = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/SEO_reporting/workfile/visibility.csv')dataset.fillna(0, inplace=True)dataset['Datum'] = pd.to_datetime(dataset['Datum'], format='%Y%m%d', infer_datetime_format=True)

All is in place for a nice and interactive widget to see SEO performance of all tracked ranking keywords per category. Which is very useful for deep diving and keep track of changes or improvements that are made per category.

tags = widgets.Dropdown(options=['All'] + list(dataset['Tags'].unique()),value='All',description='Category:',)def plotit(tags):    if 0 > 1:        print('Nothing to show')    else:        dataset2 = dataset.copy()        if tags != 'All':            dataset2 = dataset2[dataset2.Tags == tags]            plt.show()        if len(dataset2) > 0:        dataset2.groupby(['Datum']).mean().plot(kind='line',figsize=(18,10))        plt.xlabel('Period', fontsize=24)        plt.ylabel('Visibility in Google Organic Search', fontsize=24)        plt.title('Search Visibility in Google for {}'.format(tags), fontsize=30)        plt.legend(fontsize=18)        plt.show()    else:        print('Nothing to show')interactive(plotit, tags=tags)

Let me add some screenshots of the output.

SEO Visibililty all Categories
SEO Visibility Espresso Machine

To wrap up

If you are using any kind of rank tracker to monetize keyword rankings in Google it can be very useful to get instant actionable information and keep track of the effect of changes that had been made. Python in combination with the Pandas and Matplotlib libraries can help with this. A lot of rank trackers already can show this kind of information, but I think it is more fun to create this with Python. The most important reason for me is multilingual.

  1. Understand the data / rankings better
  2. Automation with Selenium and cronjobs
  3. Get used to work with Python and visualization
  4. Customize visualization
  5. Instant speed
  6. Make a shareable dashboard for stakeholders

The last point is an interesting one because not everyone in your company has access to third party SEO tools (in my case SEMrush). A Colab Notebook is very easy to modify to keep it simple for stakeholders and easy to share. And because SEO can be very difficult for some people to understand, sharing performance and visualizing the effects of optimizations helps creating support by different stakeholders and the board.

The full script can be downloaded on my Github
>> https://github.com/marcvh80/seo-dashboard << this is still an outdated script. I recommend using this article instead of this Github link. Will update this later.

--

--