How to analyze raw Adobe Analytics data in BigQuery

Jiri Stepan
Etnetera Activate
Published in
4 min readOct 15, 2018
“turned on monitoring screen” by Stephen Dawson on Unsplash

Adobe Analytics is a great tool and we love to implement and use it. You can collect consistent data from all relevant digital channels and you can then use it to distribute awesome reports. But sometimes you may need to make a deep dive into raw data. You may want to examine a particular e-commerce transaction or may need to build a customized statistical model.

This article shows how to automate the data load from Adobe Analytics to BigQuery using Linux shell script.

Adobe Analytics gives you two ways to export data:

  1. The first one is the Data Warehouse. You can select from available metrics and dimensions and what you receive is a big CSV file with all the resulting combinations of the dimensions and metrics for the particular date range. But this data is already preprocessed and aggregated by Adobe Analytics.
  2. The second way is to use the Data Feeds export. In this case, you will get partially processed data that have been sent to Adobe Analytics using Insertion API. And that’s what you need for granular data crunching.

You can configure a new Data Feed export in the Admin > Data Feeds menu. You may select individual columns (if you know which ones do you need for your analysis), but for the convenience there is also a couple of templates available (including a couple of “All Columns” options depending on which version of Adobe Analytics and related modules you are currently using).

Data feed export configuration in Adobe analytics

When your Data Feed gets exported depending on the size of the exported dta you won’t probably get one CSV or JSON or XML, but a bunch of files. And it may be pretty hard hard to use that. The following example shows how the exported Data Feed content might look like on the destination SFTP.

Result of Adobe Data Feed export for one day

The data itself is chunked in the *.tsv.gz files. Every single one of them is a gzipped TSV file with no header. Every row represents one recorded hit broken down into columns by all the selected export variables captured from the collected HTTP request (query parameters containing props, eVars and events, HTTP Headers, etc.). This data is partially processed into lookup tables for browsers, connection types, countries etc. mainly for storage space saving purposes. It means, that you won’t find “Chrome v51” in the browser column. There is an ID (like “12345”) and you will have to translate this ID to browser name using a proper lookup table.

All those lookup tables are in the *lookup_data.tar.gz archive. There is also one important file called column_headers.tsv in this archive. Yes, you’re right. Those are the missing column headers for the granular TSV data.

The last file is small txt with information about the exported data. You can ignore it or use it for checking the data consistency.

Loading this mess to BigQuery

If you want to work with this export, you have to import everything into a single database. You can then perform joins, selects, and other usual operations. We love to use Google’s BigQuery for this. It’s pretty cheap and you can easily manage access to data and its expiration. And you can make all the queries you want using SQL-like syntax. You may even use Google Data Studio and Data Explorer for simple visualizations.

And on top of that, the whole process of loading this data can be fully automated using gcloud command line tools.

The process is quick and easy

  1. Create a new project in https://console.cloud.google.com/. You will have to enable Billing to have access to BigQuery, but don’t panic, it’s cheap. Comparing to Adobe Analytics licences, it’s almost free
  2. You will need an SFTP server to export the data
  3. Download, install and configure Google cloud command line tools
  4. Configure and run the following code

Then wait for a few minutes and whoaaaa you can analyze Adobe Analytics data in Google BigQuery. The result looks like this:

Happy datamining!

--

--

Jiri Stepan
Etnetera Activate

Vedu tým skvělých lidí ve firmě Etnetera. A zajímá mne ebusiness, cestování, sci-fi, divadlo, triatlon, ...