What is a Data Layer? Tips and Best Practices

Websites dealing with any Business Intelligence need a data layer! Here’s a few ideas to getting the most out of yours.

8 min readNov 14, 2019

There’s lots of information out there on data layers, including what they are, and different ways to construct them. I’ll take a look at some common ones we see on most websites, then how to construct the best data layer for your needs. But first…

What is a Data Layer?

Every piece of software ever created has multiple layers. The larger the application, the more layers it has. Kind of like a sandwich.

A sandwich has many layers that all support each other to make a delicious combination of perfection. Each layer by itself can serve useful benefits (certain flavors, textures, vitamins, etc), but the sandwich as a whole can be an entire meal. Such is the same with software with each layer building on top of each other until you get a useful application or product.

The data layer is kind of like one of those layers. It might not be necessary in all cases, but when you have it, it can make the whole sandwich that much better. When the data layer is incorrectly implemented, it too can also negatively affect that delicious sandwich.

So what is it? A data layer essentially is a specific layer in your application that is used for reporting and collecting data. This data will be used for analyzing later to help make business decisions. The content of the data layer is usually per user and may describe certain details about the user for various analytics, marketing purposes, etc.

While a data layer can be used in just about every software application, it is by far the most common on the web. If you have ever seen an ad on a website, or analytics, or campaigns, or those chat bubbles, or…the list goes on — then you’ve seen some form of a data layer. Nearly every third party vendor has some sort of information tracking on a user so they know how to interact with that user.

What might a data layer contain?

Anything you want! There’s a fine balance of what to include in a data layer. You want to include enough information about the user to make decisions, but not enough to overwhelm all your developer resources. Let’s look at an ecommerce website, since those are heavily influenced by user decisions.

product information: id, name, price, sale price, category, size, color, …
order information: id, subtotal, tax, total, shipping, discounts, …
user information: id, city, state, country, preferences, first/last, …
page information: timing, promotions visible, products visible, category, region, currency, …
search information: term, num results, suggested terms, …
event information: event name, label of button clicked, …
lots and lots more…

Each of these will be reported on any applicable pages where you need that information. So something like a page name and page type would appear on every page, but order information would only appear on the order confirmation page.

The data layer is going to contain all the data necessary (if possible) to provide to all your other vendors. So, this might contain all your custom dimensions for google analytics, all your recommendation information for power reviews, and just about everything else.

It will also contain events! This might be if a user clicks a certain button, then you might trigger an event. Or maybe they went through a certain flow, you could trigger another event. The idea of events is to indicate that something happened and you need to increment a counter.

So, what does it look like?

Each vendor will have a different data layer format, so there’s a lot of ways you can approach this. Let’s look at a couple common vendor specific layers, followed by some common tag management ones.

Google Analytics (GA)

gtag('config', 'GA_MEASUREMENT_ID', {
  'page_title' : 'homepage',
  'page_path': '/home'
});
gtag('event', 'timing_complete', {
  'name' : 'load',
  'value' : 3549,
  'event_category' : 'JS Dependencies',
  'dimension1': 'foo'
});

You can see GA does each data point as a function call with attached data. Most params are going to be tied to configuration for that page, or an event (like purchase). Because everything is configured like this, each page is essentially an event. GA’s data layer is very specific for just their tag, so using their data in other vendors requires either a separate data layer, or integrations. You are also required to use their outlined variables for proper tracking.

Adobe Analytics (AA)

s.pageName = "homepage";
s.prop5 = "homepage";
s.eVar5 = "homepage";
s.events = "event5,event5";
s.t(); //or s.tl() for events

Unlike GA with mostly named params, Adobe mainly uses numbered variables called props, eVars, and events. There are some reserved names as well, but most are enumerated, then named within the AA product interface. Page views and events are essentially setup the same way, but the final call is distinguish by s.t() for page views and s.tl() for event calls. AA’s data layer is very specific for just their tag, so using their data in other vendors requires either a separate data layer, or integrations. However, since all the data is attached to a global s variable, any other vendor could potentially access it, as long as s isn’t cleared out. For AA, you are required to use their outlined variables for tracking (unless they are context variables, but that’s another topic).

Tealium

utag_data = {
  page_name: 'homepage',
  page_type: 'home',
  currency: 'USD',
  promo_impressions: ['123', '234']
};// for events
utag.link({
  ...some data like above
});

Tealium’s tag manager is built for many vendors, so the data layer isn’t specific to any one vendor. They provide two separate flat tracking layers, one for the page view and one for events. Any vendor can access the utag_data object on the page, but most vendors will be implemented through the tag manager, where the data layer can be supplemented for each specific vendor as needed. The variables have an implied standard, but they can be named whatever you want, making this very flexible.

Google Tag Manager (GTM)

dataLayer = [{
  pageName: 'homepage',
  pageType: 'home',
  currency: 'USD',
  promoImpressions: ['123','234']
}];// for events
dataLayer.push({
  ...some data like above
});

Similar to Tealium, the variables can be named whatever you want and there is a page view and event setup that are different. However, similar to GA, GTM’s data layer is all event based, so you don’t have to have any page view code and could implement all with dataLayer.push(). For certain vendors (like GA), this data layer will have specific variable names and multiple levels of nesting, unlike the completely flat and open data layer Tealium utilizes.

W3C

digitalData = {
 pageInstanceID: "MyHomePage-Production",
 page:{
   pageInfo: {
     pageID: "Home Page",
     destinationURL: "http://mysite.com/index.html"
   },
   category:{
     primaryCategory: "FAQ Pages",
     subCategory1: "ProductInfo",
     pageType: "FAQ"
   },
   attributes:{
     country: "US",
     language: "en-US"
   }
 }
};

Similar to Tealium and GTM, the W3C data layer can use all your own custom variable names and isn’t vendor specific generally. However, this standard is much more well defined and each data layer section is somewhat grouped into their type. This leads to a fairly nested object with specific variable names. The name and usage for pageview/event isn’t as specific as how the data is structured.

So many options!

Brain…hurts!

I’ve only given you 5 examples. Imagine literally thousands of vendors all having their own formats on data layers! This is why it is so important to plan out your data layer instead of just tacking on bits and pieces as they come. This is also why the prevalence of tag managers have come into play so much more over the past decade or so.

So how do I plan this out?

I’ve known people that have whole consultancies built on this very question. I spent about 5 years doing it as well! Generally, here’s the process:

Identify your vendors. This will be things like your analytics, marketing, support, etc. I find it’s useful to compile a big spreadsheet of each vendor you plan to use and include their code snippets and tracking requirements for each one.
For complex vendors (like analytics), identify your tracking needs. This includes figuring out the questions you need answered, then mapping those answers to the appropriate variables. Put those variables in another sheet for each complex vendor.
Start building a list of vendor-neutral variables. By taking another blank sheet, start making a list of all the variables that appear for each vendor and put them in a vendor neutral variable. For example, maybe I have a page type variable in AA as prop1, GA as pageType, and some other vendor as page. These are all the same page type, so maybe my vendor neutral variable becomes page_type. You’ll also want to repeat the step for events (checkout, newsletter submit, etc.)
Once you have these lists, then you can decide how to organize them. I personally like the flat approach Tealium does in their utag_data object. As a developer, this one is easiest to reference in comparison to nested data layers. However you decide to organize it, make sure it is readable, makes sense, and is flexible.
Implement it! Get it coded on the pages so you can start using it.

What else should I know?

You have to remember that the data layer is a core piece of your software that will help drive decisions for your business. The more planning up front you can do before implementing, the more time you’ll save down the road. It is important to get it right so your data tracks correctly, so having a round of audits during the implementation is also crucial.

You also need to be consistent. Pick a variable naming scheme and stick with it. This goes for the values in the variables as well. This is by far where the most errors come from in implementations I’ve seen, sometimes to the point of completely breaking pages since developers weren’t given enough instruction. If you need hints on this, take a look at Tealium, GTM, or W3C examples, as those are very widely used and have defined standards.

Lastly, think smart about what you’re putting in the data layer. I’ve seen some outlandish requests for tracking that ended up never being used in any capacity, but took tons of dev hours to implement. Your data layer should 1) implement what is needed for your vendors, 2) implement what is needed to answer your questions. It doesn’t need more than this, but both of these absolutely need up front planning. If you don’t plan up front, it will cost a lot more later to add and maintain things that could have been avoided easily in the initial implementation.

Final Thoughts

Hopefully this article gives some insight on what the data layer is and how to use one. It doesn’t have to be hard, though it can be overwhelming up front. Just remember to plan up front and make it useful for you!