Automating Analytics Testing

Aaron Wolford
USA TODAY NETWORK
Published in
5 min readSep 12, 2019

How USA TODAY’s Quality Engineering team uses HAR files to test network calls.

Production testing of analytics providers is a valuable necessity for companies in order to ensure their data is reporting accurately. Without this, data may be missing or incorrect, and relying on bad data leads to poor business decisions. Testing analytics manually is a time-consuming process, so implementing automated tests is an important step in ensuring data integrity.

What are we solving?

Traditionally, manual testing was a very time-consuming process. Business requirements include a list of analytics providers with multiple values populating within the analytics call for that provider. Sample values may include page type (article, gallery, video), page URL, section (i.e. sports) and market. Testing this required looking through the network calls within the developer tools of a browser, finding the call and validating the values within the call. Validating one Adobe value for a sports section front looks like this:

In the image above, first we search for the network call for Adobe (srepdata.usatoday.com). Once we find the call, we locate the post data and have to search through it for each Adobe analytics value. This is a difficult task to test manually because the post data can be thousands of characters long filled with special characters that are hard to search through and extra data we do not need. For most of our page types, we are looking to validate 10–20 values within that post data.

Not only is it difficult to search for each of these values, but at a company as large as USA TODAY, we could come up with more than 40,000 test cases once we map out all the analytics providers, page types and markets. Additionally, multiple development teams within the company can impact analytics reporting, thus raising the risk something could change. We needed to find a better solution to test analytics faster and more efficiently.

What is our solution?

After researching a few options, we selected Sauce Labs’ Extended Debugging tool to automate our tests. Sauce Labs provides us with a HAR file we’re able to parse. A HAR file is an http archive file that contains all the network traffic for the URL being tested. We first parse the file for the network call and then for the specific values within the call. Using the example above, to test Adobe, we search the HAR file for the srepdata.usatoday.com. Once we find it, we pull in the post response data, then match against a key-value pair dictionary. If all the values are in the response and the data matches what we expect, the test case passes. If any values are missing or contain data we do not expect, the test case fails.

The tests are written in Python and ran via pytest. The job is kicked off by a Jenkinsfile and notifications are built in to notify Slack for pass/fail results. The HAR files are available to download via Sauce’s UI for research and troubleshooting to easily answer questions like these:

· Are the calls missing?

· Are values within the call missing?

· Is there bad data inside a field?

The time needed for root cause analysis decreases. But in a perfect world where all tests pass, no one needs to look at a network call or HAR file again.

What have we been able to accomplish?

We currently have more than 270 tests running on a cron job twice a day. These tests cover a dozen analytics providers across multiple page types — which we’re assessing against six of our 100+ markets. The test results are sent via Slack notifications and are also imported to an internal analytics dashboard, where stakeholders can review test results and our Quality Engineering team can watch for failures and flaky tests. This has led to a 92% reduction in test time. We estimate it would cost us $40,000 per month if we were to perform these same tests manually.

Perhaps the most material business benefit is our newfound ability to immediately detect and address bugs within our analytics calls. This could take weeks before, when our Business Intelligence team would look at month-over-month trends and ask developers to investigate anomalies. When we fail to capture traffic and audience data, there’s no way to recreate it.

What’s next?

While we’ve made great strides in automating our analytics testing, there’s still more we’d like to accomplish:

· Packaging the tests so that teams can import the code and run their specific tests, as part of their pipeline process, helping reduce the risk of bugs getting to production.

· Testing native app analytics, which are more complicated because the network calls are not as easy to find as a browser. Thus increasing the time it takes to run a test case manually.

· Setting up the tests to run headless. Headless tests run without the GUI thus speeding up the time it takes to run the tests. There is no waiting for the images, videos, etc. to load onto the page before grabbing the HAR file.

· Rotating the markets in our test cycles to expand our coverage without slowing down the total test run time.

Give it a try!

HAR files are great for more than analytics testing. We’ve found them to be a valuable test tool to easily get the information needed within network calls. There is a lot of other useful information in HAR files, from page load errors to network performance. Getting this data all from one place makes debugging issues quick and easy.

--

--