Automating the boring “Martech solution(s) testing” with Selenium

An approach

Keerat Sachdeva
Engineered @ Publicis Sapient
14 min readJan 19, 2024

--

An AI-generated image of a guy bored of Martech solution(s) validation

The need

Recently, I was part of a complex and big project👷🏻‍♂️. My client was migrating all their 1000+ websites to a new platform with a different data layer. My task as a Data Analyst-cum-Web Analytics developer was to update all the existing Adobe and Google Analytics tracking implementations for these websites according to the new data layer. Anyone implementing such Martech solutions via a Tag manager or performing such migration knows that before going live on production, thorough testing is required on the lower environment(s) to ensure a consistent data collection setup.

Honestly, I have always found testing these Analytics solutions tiring and effort-intensive. One must complete multiple user journeys, click here, click there, open the browser console and network tabs on every web page to validate the data layer and the Analytics HTTP requests, and check the console tab for any JS error(s) day in and day out. The process is laborious, time-consuming and monotonous. Moreover, it consumes a lot of an Analyst’s time, leaving us less time to focus on, and analyze the collected data.

Even after our tiring efforts in the validation, inconsistencies and errors with the tracking setup can still exist, impacting the overall data collection and business negatively 📉. Poor data quality can lead to poor decision-making, business inefficiencies, mistrust, missed opportunities, and revenue losses. Hence, it is vital to ensure the quality of the captured data to rely on it for critical business decisions, and there is no way to compromise the validation of these Analytics solutions. But the testing involves several repetitive, time-consuming and monotonous tasks.

So, why not try some automated testing tools, which will be faster and more efficient than manual testing? They will provide better results with enhanced productivity. As John Ruskin has quoted beautifully,

“Quality is never an accident. It is always the result of intelligent effort.”

Let us try putting in this intelligent effort to get quality results. Looking for inspiration for automation, I stumbled upon a great quote from Bill Gates.

“The first rule of any technology used in a business is that automation applied to an efficient operation will magnify the efficiency. The second is that automation applied to an inefficient operation will magnify the inefficiency.”

Selenium, the savior!

So, I started looking for various automated testing tools for web applications and came across several solutions like Puppeteer, Katalon Studio and Selenium. Despite the varied options, Selenium stands out for its versatility and extensive community support, making it a preferred choice for web application testing due to its compatibility with multiple languages, browsers, and operating systems and the wealth of online resources available. Hence, I decided to continue with Selenium.

Selenium logo

Selenium is a popular automated web testing tool. It is an open-source tool that automates web browsers. It automates browsers. That’s it! What we do with that power is entirely up to us. Primarily, it is for automating web applications for testing purposes, but it is certainly not limited to just that. It can (and should) be used to automate monotonous web-based administrative tasks.

At the core of Selenium is Selenium web driver, an interface to write instructions that work interchangeably across browsers. It is implemented and supported in JavaScript (Node.js), Python, Ruby, Java, Kotlin and C#. It allows us to open a web page with its URL, interact with any/all the HTML elements, and execute any JS code on the page. This feature/ability to run any JS code on a web page will come out quite handy in our approach.

The snippet shows how to open a web page with the web driver. I am accessing google.com with the Chrome web driver object in Python here.

A gist with all the source code is present at the end of the article.

The approach

Most Analytics tools like Adobe and Google Analytics are JS-based solutions that require a piece of code (Tag/Pixel) to execute on the web page. This code can be added to the web page HTML directly by the developers or dynamically (and more effectively) using a Tag Management Solution (TMS), like Adobe Launch or Google Tag Manager. A data layer is an imaginary layer between the web application and the TMS. It is a JS object containing all the relevant data about the visitor and web page. It is a repository of the corresponding data that our Analytics solutions use to report on the page, product, transactional and user data. The tags capture the relevant data about the visitor and web page and sends it to the respective processing servers via an HTTP request. This data is then processed into reports and dashboards, which help the stakeholders analyze and optimize their applications with data-driven decisions.

The following diagram explains how the overall Web Analytics solutions validation works:

The role of a Web Analytics developer here is to complete different user journeys on the website and validate the data layer and all the Analytics HTTP requests on the client side (Browser side). A client-side Web Analytics implementation is one where all the tags are deployed directly on the web page, and the visitor (the client) sends all the Analytics HTTP requests directly to the different Analytics solutions processing servers.

The server-side Web Analytics implementation is relatively newer and better. It introduces an intermediary server between the visitor (the client) and the Analytics processing servers. Here, the client sends all the HTTP requests to the server, which forwards them to different Analytics solutions servers. Our approach can be extended for the server-side implementation, but the article focuses on the client-side implementation.

The idea is to replace the Web Analytics developer here with Selenium & Python, and let them complete the different user journeys and validate the overall tracking setup for us. Here is the updated diagram with Selenium and Python in the picture:

Now that we understand where Selenium and Python fits in the validation process, let us move ahead with our approach of automating the boring validation of the Analytics solutions.

I have created a simple website to showcase the approach. It has a home page with a link to a dummy web page and a dummy web page with a link back to the home page. Both pages have a data layer object (digitalData) with two values.

  1. Page name (A user-friendly name for every web page)
  2. Page type (Page type of the web page. Home, Product listing, Product details, Cart overview, et cetera)

Currently, we have static dummy values in these variables. However, such variables populate dynamically on all the web pages in an actual implementation. I have set up Adobe Analytics on it using Adobe Launch. I am tracking the page views and link clicks on this website. We will see the Adobe Analytics HTTP requests if we load a web page or click on any of the links.

Here is a screenshot of the browser network tab:

Note: b/ss is the identifier to filter the Adobe Analytics requests.

Let us consider a real scenario where we have such a data layer available on multiple web pages and not just two, and we need to validate the page name variable for all these pages. Hitting every web page URL in the browser search bar and validating the value is something we shouldn’t think about. Thanks to Selenium!

The web driver allows us to execute any JS code on a web page, which is like running it manually on the browser console. Here is a JS function to fetch and validate the page name data layer variable. The function takes an argument of the expected page name and returns true or false after comparing it with the actual page name on the web page.

We have manually executed this validation code in the browser console here. Let us see how we can achieve the same programmatically.

Here, I open the home page of our website and execute a JS code on the web page using the execute_script method. It expects a JS code (a string) as an input, and we can get an output from this method using the return JS statement. I have defined an expected page name variable and compared it with the page name present in the data layer on the web page (digitalData.page.name) and see how I got the same output as above programmatically using Selenium and Python. It returned true for the home page and false for any other page name.

I can even utilize this method to get the name of the Adobe Launch property on the web page using the JS satellite object.

Let us see the Adobe Launch property present on the home page of our website using the browser console.

Brilliant, they are similar! ATBSWS (Automating The Boring Stuff With Selenium)! We have now started utilizing the power of Selenium.

So, we can simply iterate over all the web pages and run such JS code(s) with our web driver to fetch and validate any data layer variable. We just require a list of all the URLs/sitemap and the expected data layer variable values on each web page. Hence, we can automate the validation of a data layer variable or any JS variable using Selenium.

But what about the HTTP requests?

As explained above, Analytics Solutions sends the captured data to the processing servers via HTTP requests. We can inspect all these requests by filtering them with their server domains in the browser network tab. We can automate the validation of these requests if we can fetch all the network traffic programmatically using JS. We know the web driver allows us to execute any JS code on a web page. So, if we get a JS code to extract all the Adobe Analytics HTTP requests on a web page, we can use our web driver to execute it on every web page and fetch the Analytics HTTP requests for all these pages.

So, I started googling ways to extract/fetch all the network traffic programmatically using JS and came across the Browser Performance API. Here is a detailed explanation of it. We can use its performance interface to get the URLs of the various HTTP requests on a web page. The screenshot below has a JS function to fetch the Adobe Analytics requests on a web page using the performance object.

The above function call returns an array of length one, meaning we have only one Adobe Analytics HTTP request on the web page currently (the page view beacon). I have used the getEntriesByType method here, which returns an array of PerformanceEntry objects currently present in the performance timeline for a given type. The logic returns all the entries of type “resource”. We have the JS filter function applied on this array to get only the entries where the name attribute has b/ss.

The name attribute of the PerformanceEntry object here gives us the URL of the requested sources.

Since this Adobe Analytics beacon/HTTP request is a GET request, we get all our captured data (eVars, props, events, and out-of-the-box variables) in the request URL as query string parameters.

In the above screenshot, we can see the various data points captured in the request.

  1. Character-encoding is UTF-8 (ce = UTF-8)
  2. The page name is “home page” (pageName = home page)
  3. The currency code is INR (cc = INR)
  4. Hostname is keeratsachdeva.github.io (server = keeratsachdeva.github.io)
  5. prop1 is referencing eVar1 (c1 = D=v1)
  6. eVar1 is home page (v1 = home page)
  7. The screen resolution is 1280 X 800 (s = 1280 X 800)

Here is how we can execute the above JS code programmatically using Selenium and Python:

We can obtain all these PerformanceEntry objects as a list of dictionaries in Python. Now, we can directly use the name key of these dictionaries to get the Adobe Analytics request URLs as Python strings. Isn’t this amazing?

We can now use the basic string manipulations to extract the different query string parameters (QSP) from this URL. Here, I have decoded the request URL and used multiple split statements to get a list of different QSPs in the first Adobe Analytics request (List index 0).

We can access these data points more easily using a Dictionary. Here, I store these data points in a dictionary so that we can easily access any variable.

And see how easily we can validate any variable with the help of this dictionary. Here, I am validating eVar1. I am comparing the expected value of eVar1 with the v1 parameter in the Adobe Analytics request.

We can extend this to any other eVar, prop, event, or out-of-the-box variable. So, we can simply iterate over all the web pages and run such JS codes with our web driver to fetch and validate the Analytics HTTP requests. We just require a list of all the URLs/sitemap and the expected Analytics variables’ values on each web page. Hence, we can automate the validation of the Analytics HTTP requests using Selenium.

But one thing to note here is that this might not be the case with the POST requests, which have all these variables in the request body and not in the request URL as query string parameters. The Browser Performance API doesn’t provide any information/details about the other request components (like headers, body, et cetera) and the response. So, we can use Selenium Wire to capture the web page HTTP requests and automate the network calls/requests validation (If the request body and response are to be validated, for example, in POST requests).

Selenium Wire extends the Selenium Python bindings to give us access to the underlying requests made by the browser. We author our code the same way as with Selenium, but we get extra APIs for inspecting requests and responses and making changes to them on the fly.

I am using the Flipkart website here to demonstrate how we can use Selenium Wire to validate POST Adobe Analytics HTTP requests.

We can use the requests attribute of the web driver to get a list of all the HTTP requests on a web page. The length of the requests list is 165, meaning we had 165 requests on the web page the moment we executed the code. I have used list slicing to see only the first five.

We can look for b/ss in the request URLs to filter out all the Adobe Analytics requests. The output is a list of length three, meaning we had three Adobe Analytics requests on the web page the moment we executed the code.

We can get the method and body for every HTTP request on the web page with the method and body attributes of the request object. Note that the 2nd Adobe Analytics request (List index 1) is a POST request, and the request body is of type byte string.

So, we need to use the decode method to convert this byte string to a string, and then we can perform the same operations as shown above for the GET requests to get all the data points in the request body as a list.

The next steps are similar to what we have seen for the GET requests. We can convert this to a dictionary and validate any Analytics variable in the request body.

So, this is how Selenium and Python can help us validate the data layer and Analytics HTTP requests on different web pages. For simplicity, I have only shown how to load a web page and validate the Adobe Analytics page-load-based beacons/requests, but we can easily extend this approach to validate the click and scroll-based requests. Selenium allows us to interact with the different HTML elements on the web page and complete different user journeys on the website quite easily.

We can utilize the same approach to validate the requests for other Martech (Marketing Technologies) solutions like Google Analytics, Marketing pixels (Facebook, Twitter), Hotjar, et cetera.

How did it help me?

The automation approach has helped my client move into a better state of data collection quickly and more confidently. I created several Python scripts based on this approach and was able to solve multiple problems for my client. I have started using this logic for many tasks. It has helped me identify web pages with wrong data layer variables, incorrect or no TMS scripts, and much more. Recently, I had to find the count of my client websites with no GA4 implementation. GA4 is the latest version of Google Analytics, which replaced the older version UA (Universal Analytics) in July 2023.

The task was to identify such websites and take further actions to implement GA4 on them. I ran a script over 1100 websites of my client to identify the websites with no GA4 implementation. I utilized the performance object to filter the GA4 requests on these websites (using the filter collect?v=2) and found all such websites in just 84 minutes.

Let us compare this if we had to do it manually. Here are the steps we would have done manually for this task:

  1. Copy the URL for each website (from the internal records) and paste it into the browser search bar.
  2. Wait for the GA4 Analytics requests to trigger on the web page.
  3. Update the records with our findings (GA4 implemented or not).

Let us assume the time involved for each step is 4, 5, and 5 seconds, respectively. Hence, the total time involved manually would have been (14 * 1100)/3600 = 4.28 hours approximately. Some points to note here are that these 4.28 hours do not include any of our tea, coffee, or power nap breaks, and the time we spent cursing our friendly marketer for this mammoth task.

The automation approach helped me complete this task in 84 minutes with around 67.27% lesser effort, and since it is just a Python script running on my machine, I am free to continue with my other tasks in the meantime (maybe getting a power nap as well.

I hope this article explains the problems involved with the validation process of the various Martech solutions implementation and helps us understand how we can leverage the power of Selenium to automate this boring stuff. Here is the gist I talked about above. Do mention in the comments if you are interested and want me to share more articles with various code snippets explaining the approach in more detail.

--

--