Testing Adobe Analytics with Python

John Simmons
7 min readFeb 1, 2017

--

This is a demonstration of how to use Python to QA Adobe DTM rules

Setup

I have configured Adobe Analytics to fire an event on every successful page load on my site. In this case, it is event10. I set up my rule in DTM then spent a few minutes clicking around the site. I think I have everything configured correctly. I click a link, check Omnibug for event10, and repeat. While this technically works, it is tedious and prone to error. I think I can optimize and automate this workflow with Python. Ideally, I would like to see a table that contains all of the relative links on my homepage, the events that fire on those pages, and an “everything is ok” indicator. This can be accomplished using some Python automation. Lets get started.

As with all Python projects. I start by creating and activating a virtual environment in my project folder. I always name mine venv.

$ virtualenv venv
$ . venv/bin/activate
(venv) $

Ok now that I have my environment set up its time to install the packages. I’ll use Selenium to crawl the site, and BeautifulSoup to parse the page source code for anchor elements to visit. I will also need an html parser to use with BeautifulSoup. I like html5lib for no specific reason. Finally, I will grab Requests to do some http requests without using a full browser. You can install multiple packages at the same time with pip by separating them with a space.

$ pip install selenium beautifulsoup4 html5lib requests

Finally, since the Selenium package does not come with a Chrome driver by default, I need to download the Selenium Chrome Driver here. If you are on a Mac, you can place the executable in the Selenium folder in your virtualenv. I imagine Windows is something similar.

I prefer using Chrome because it allows me to use the DTM Switch and Omnibug extensions should I need them. Note that extensions must be specifically loaded into the Selenium instance of Chrome. More on that later.

Driver

Now that I have my Chrome Selenium driver in the right place I need to configure it. I prefer to do this as a function in a second file called browser.py and import it into my main file.

  • Webdriver is the selenium module that is used to create the browser object. This object will be used to traverse my list of links. I customize this object in the function, then return it.
  • Chrome Options allows me to add extensions. Note that Chrome extensions must be packaged before they can be used with Selenium. More on that here.
  • Desired Capabilities is dict of Chrome preferences. I update it here to allow access to the browser’s javascript console output.
  • I use a .env that is not in source control to store sensitive information. The OS library is needed to access it. For example:
Example of sensitive values in my .env file

Main Script

I have my script set up as a few variables, and 3 functions. I will go through it in chunks then show how it all works together. First, the imports:

import os
import requests
from browser import chromeDriver
from bs4 import BeautifulSoup as bs4
from selenium import webdriver
from selenium.common.exceptions import WebDriverException
from time import sleep
  • OS is imported here to get the name of my current directory. As I visit the relative link pages I will be saving screenshots. I use OS to tell Selenium where to save them.
  • I will use requests to get the source code of the home page in order to parse the relative links from it. The homepage will still be checked for events since it contains a relative link to itself.
  • chromeDriver is the configured Chrome instance I set up in browser.py
  • The BeautifulSoup soup module to create BeautifulSoup objects which can then be parsed. I use the as modifier since I always misspell BeautifulSoup one way or another.
  • WebdriverException is needed to catch any errors that might occur trying to crawl the relative links. For example, if a bad URL gets in there somehow…
  • Finally, when using Selenium, I like to use sleep occasionally to build in some extra loading time for pages before trying to manipulate them.

Next, I have declared a few variables to make things a little more readable and explicit.

stagingScript = "localStorage.setItem('sdsat_stagingLibrary',true);"
debugScript = "_satellite.setDebug(true);"
pageNameScript = "_satellite.getVar('Global - Pagename')"
target_url = "https://www.homepage.com"
HTML_string = "<!DOCTYPE html>..."
  • stagingScript & debugScript are lines of Javascript that can be executed in the browser console to turn on DTM staging and debug respectively. I won’t be using them here specifically, but they are nice to have if I need them.
  • pageNameScript is javascript that returns a specific analytics variable we use for naming pages. This will be the first column of the table I will output.
  • target_url is the homepage I want to extract relative anchor links from.
  • Finally, HTML_string (cutoff to save space) contains the beginning portions of a full HTML document. As I iterate over the links, I will build additional elements and append them to this string. I will then append closing tags and save the entire string as a .html file.

My first function returns a Set of relative anchor tags on a specific page. I am specifically using a Set type here so as to cut down on duplicate pages. However, I might still visit the same page twice since I am creating the Set of the bs4 tag objects not just comparing the anchor text. I am keeping the items are bs4 tags so that I can still parse the tag objects later if I want to.

Requests send a GET request to the url and I store the output (status code, headers, page source, etc.) of that request in a variable called r. I assign the parseable BeautifulSoup object in a variable called soup. I then search soup for anchor tags whose href does starts with a “/” and does not end with common domain names. These are pretty good indicators that the href is a relative link. I recently found out that I can pass multiple values to startswith/endswith using a tuple.

Ok, its finally time to fire up Selenium. My next function will visit a page, execute the Javascript necessary to find the events and pagename, take a screenshot of that page, and append the necessary markup to the HTML_string. I should note that Selenium is a lot more powerful than what I am using it for here. It is possible to fill out forms, click specific elements, etc. If I wanted to do any complex behaviors I would build them into this function and execute them conditionally depending on the page. In this case, for the sake of simplicity, I am just visiting the page and moving on.

The function takes a URL and a relative URL as parameters in order to build a complete URL. Think http://www.homepage.com and /about-us. Selenium then opens an instance of Chrome and goes to that page.

Once on the page, its time to execute my my Javascript using driver.execute(). I am wrapping this entire section in a try/except block for easier debugging and because I expect it to work most of the time. Driver.execute() executes the Javascript passed to it, but does not return anything. To get console output I will access the driver log. Remember that I set the preferences in my driver to be able to access the browser’s console output. I use python multi-line quotes here to denote any Javascript strings to differentiate them.

In the Javascript console I am logging the page name and the events that fired on that page. I separate them with the “|” character for easy splitting. Once the console.log runs I use driver.get_log() to parse the contents of the Javascript log. I am not entirely familiar with the nomenclature of its output (level, INFO, message, etc.), but I know that those terms are where I can find my pagename + events log. I use a list comprehension to find the log and use it to assign pageName and events as python variables. I then add them to my HTML string and take a screenshot.

Finally, I run these two functions inside a main function. After importing my global variables, I created a list of relative anchors to work from. I loop over this list with my visitPage function which will get the events and take a screen shot. After that, the only thing thats left to do is close the HTML tags and save the HTML_string to a file. I can then open this file to check my events.

Thats it. My script has output my html document. I now have a table of all the relative links on my homepage and what events fire on those pages. I can also click the page name to see a screen shot of that page. It looks like my rule to fire event10 on each page load is working correctly. This is not a perfect QA, but it is much less tedious than doing it all manually.

So where do we go from here? This script works fine on its own, but I think it would translate nicely to a Flask app or a cron job that runs periodically. It all depends on who wants to consume the data and how. I would also like to reiterate that I have barely scratched the surface of what Selenium is capable of. Provided you have the motivation, it would only be a little more work to check events, props, and eVars on every page of entire user flows full of complex interactions.

--

--