Building a Better Scroll Depth Tracking Plugin for Google Analytics
This post was first published May 10, 2016 on SERPs.com — some of the content may now be outdated. I’ve since moved to primarily GTM-based analytics tagging. I hope to post an update on my updated scroll tracking approach using GTM in the future.
Many websites use custom scripts to record user interactions over and above what analytics software tracks out of the box. These scripts can track information like clicks on outbound links, AJAX form submissions (where there isn’t a “thank you” page to create a goal off), and how far a user scrolls down a page.
Tracking how deep a user scrolls down a page, in particular, can be extremely useful in measuring user engagement with content — especially for content-centric sites, where you might see high bounce rates (users reading a single article and moving on), but need to differentiate between what I call a “true bounce”, or as Avinash says “I came, I puked, I left”, and a user who was actually engaged with the content (maybe spent several minutes reading it, but simply didn’t click further into the site afterwards).
Distinguishing between these two types of bounces is important because the single-page-view/engaged-visit is still valuable. Users may have multiple interactions with your site’s content before taking a desired conversion action, and tracking how deep a user scrolls is a great way of measuring that engagement.
Currently, there are plenty of scroll tracking plugins available out there, but, in my experience, they leave much to be desired, and few have been updated to take advantage of the newest features of Google Analytics.
In this post we’re going to look at how scroll tracking works, and where existing plugins are coming up short. I’ll then cover the process of building a better scroll depth tracking plugin. After using different iterations of this code for over 3 years, I’ve learned the hard way where the pitfalls of this kind of tracking are, and what actually works — and I’m excited to get to share all of this with you now!
How Scroll Tracking Works
Basic scroll tracking has two components — calculating scroll depth (how far a user has scrolled down the page), and transmitting that data to the analytics server.
Typically when we track scroll depth, we want to know the farthest point down the page the user went, even if they scroll back up. So we’ll want to store the user’s maximum depth, and should only update this if they scroll beyond it.
The second component, transmitting that data to the analytics server, is a little more complex, and where most of the existing plugins could use some improvement.
Where a lot of scroll tracking plugins fall short
Over the years I’ve played around with a lot of scroll tracking plugins, but I’ve never found anything that worked exactly the way I wanted.
Some of the most common issues I’ve come across:
Further, jQuery isn’t included in every website, and because of the size of the jQuery library, it can take time to load. Depending on where your tracking plugin is loaded, you could end up missing data, or causing an error.
I once saw a terrible case of this while consulting for a large corporate client (who really should have known better). The client’s analytics team had put a custom tracking script on every single page of their site (regardless of whether or not the page already had jQuery on it), and the script was loading an extremely old version of jQuery, just to use the getScript method (which simply loads a script from a URL). That’s 20kb (actually 120kb in this case — they weren’t using gzip) for something that could otherwise be achieved with about 2–3 lines of code. HEADDESK
Sending way too many events: Scroll tracking makes use of an event that is fired again and again as a user scrolls — every 10–30 milliseconds. If you were calling ga(‘send’, ‘event’, …) on that event, you’ll blow through GA’s 200,000 hits per user/500 hits per session limit in minutes… if you’re lucky.
This also slows the page down significantly. Even plugins that were using a throttling function, reducing event frequency to somewhere in the range of every few hundred milliseconds to every few seconds, still ended up sending a lot of wasteful events.
Only sending scroll thresholds: This was a common solution to the problem of sending too many events. In this method, the plugin would only send an event if the user passed certain percent thresholds (e.g., 20%, 40%, 60%… etc.). While this does cut down on the number of events, I was convinced that there should be a way to communicate this data in a single event. Also, scroll depth is a number and should be communicated as one — and communicating it as number would allow us to average it out for different segments on the other side, providing a more accurate picture of engagement.
Events not tied to a page view: I wanted to see different scroll depths for different sections of the site, but when I’d go to the GA Events report and add Page as a secondary dimension, I’d get nothing. The problem is that the page view hit is sent as soon as GA loads, but none of the events that follow are actually associated with the URL of the page by default.
When to Send Data
Let’s take a step back for a minute. Thinking about this conceptually, in an ideal world, where would we want to see the scroll depth metric?
Wouldn’t that be nice? You could see bounce rate and scroll depth together!
Unfortunately, this isn’t really possible (even with GA’s custom metrics). Once data has been sent to Google Analytics’ servers it can’t be changed, and while the page view hit is sent right as the page loads, we can’t know the maximum scroll depth until right before the user leaves the page.
Turns out there is a way to fire an event right before the user leaves. It’s called the beforeunload event, and it’s fired right before the browser unloads the page, after a user clicks away or closes a tab (there is also an unload event, but beforeunload is better for triggering functions). It’s this beforeunload event that allows some web apps to prompt to you to save your work before you leave, or more annoyingly, allows other web pages to use those “are you sure you want to leave?” alerts.
There are a couple of issues with using the beforeunload event. First, it occurs so close to the time the window closes that sometimes Google Analytics doesn’t have enough time to send all of the beforeunload data back to the server. Second, beforeunload it isn’t supported on all mobile browsers.
Fortunately, recent browsers have adopted a function called sendBeacon, which allows for small packets of data to be sent asynchronously to a server while the page is unloading. Google Analytics actually now defaults to this method, depending on the browser’s capabilities.
We’ll look at how to deal with older browsers, as well as mobile browsers that don’t support beforeunload, later in the post, but for now let’s focus on how to use this when it does work.
Basic Scroll Tracker
Here’s a basic example of how to track scroll depth using the beforeunload event and a sampled scroll event.
Note: this will only work for newer browsers like Chrome. I’m omitting all the cross-browser stuff for brevity (we’ll get to that later). We’re also assuming that this code will come after the GA snippet on the page.
Our throttleScroll function prevents us from overdoing it on the scroll event (the scroll callback is only responsible for setting a variable to true), and our beforeUnloadfunction takes the depth variable sends it as a properly formatted event to GA.
Scroll Tracking and Bounce Rate
Several years ago, when I first implemented scroll tracking, the most immediate side effect I saw was a massive reduction in bounce rate. In retrospect, the cause was clear — because I had an event firing at the end of every page view, no level of interaction with the page could be considered a bounce. What’s more, the fact that the reported bounce rate was reduced, but not completely to 0%, actually indicated an error in the scroll tracking code — because if the scroll event was firing correctly, there would be no bounces.
Obviously, having a 0% bounce rate renders that metric essentially useless. The simplest solution to this would have been to change the event firing at the end of the page view to a non-interactive hit, meaning that the events sent wouldn’t be counted as part of the bounce rate or time-on-page calculation. For analytics users who prefer the traditional bounce rate calculation (where a single-page view visit is counted as a bounce), this would be the desired solution.
However, I’ve always felt that this default bounce rate calculation obfuscates a lot of information. A visitor who arrives on the page and immediately hits the back button isn’t the same as someone who spends time reading and engaging with content, but just doesn’t navigate beyond that page.
The scroll tracking event gave us an opportunity to define what we wanted a bounce to be. If we defined our own thresholds for interaction, visits that fell under these thresholds would send non-interactive scroll hits, and count as bounces, while visits that exceeded these thresholds would send normal events.
The two thresholds I used were time on page and scroll depth. The idea being that if a person were on the page for more than, say, 15 seconds, and if they scrolled more than 10% of the way down the page, we would treat that user as “engaged” and not add the non-interactive flag to the event.
This practice produces what’s called an “adjusted bounce rate”. There are plugins that will do this as a stand-alone function, but because it’s an inherent part of the scroll tracking code, it really makes more sense to just include it as part of another custom tracking library. Further, many of these stand-alone adjusted bounce rate plugins send events continuously every 10–15 seconds for a more accurate time-on-page calculation, which leads to the same overuse of events problem that many scroll tracking plugins suffer from.
This code is very similar to our first example, but we’ve added a couple of settings variables at the top, as well as a “nonInteractive” variable that we set to true by default. Our two thresholds, timeThreshold and scrollThreshold, are set in seconds and percentage depth. One of these thresholds must be satisfied if the scroll tracking hit is to be counted as interactive.
Using an adjusted bounce rate also affects the time on page metric. Typically, time on page is calculated as the time between two hits — the first being the page view event, and, absent any other event tracking, the second will be a subsequent page view. If there is no subsequent page view, the time on page will count as “0”. Consequently, overall session duration will also increase, as the time spent on the final page of a session is now being measured.
Associating Scroll Depth with a Page
Another issue I mentioned having found with many scroll tracking plugins was that the scroll events weren’t associated with page views, because the page views were sent separately from the scroll event.
A workaround to this can be found in the “set” method provided by GA, which allows you to set or change parameters — parameters are then sent on every subsequent hit during the lifecycle of the page, or until the parameter is changed or deleted. You’ve seen this if you’ve ever used a custom dimension:
You can actually use this to set many different fields in GA.
Now all of the hits we send will have the page parameter set, and we’ll be able to see them in our event report by going to the “Pages” tab under Events or using Page as a secondary dimension:
Important Note: by using window.location.pathname, you will lose any parameters after the path in the URL. If you need to track parameters, you will want to use different code here, for instance: window.location.pathname + window.location.search.It’s probably a good idea to get some help from your developer if you’re not sure.
Google Analytics Plugin API
Of all the changes that came with Universal Analytics, one of the coolest (yet under appreciated) was the Plugin API. You may already be familiar with GA plugins without even realizing it. The cross-domain linker plugin and enhanced ecommerce, for example, are both plugins used for adding data to analytics hits.
There are two main advantages to using the plugin API, as opposed to just writing your own custom tracking script and sending hits through the standard ga function.
First, because the Google Analytics tracking snippet loads the analytics.js library asynchronously, it can be a challenge to know when the plugins are fully loaded and ready to use. Plugins take care of this for you by delaying any hits being sent until all plugins are loaded and ready to use.
Second, it provides far more powerful methods for interacting with Google Analytics than simply sending additional hits. A GA plugin can intercept and, if desired, modify a hit before it’s sent to Google Analytics. It can also allow hit data to be sent to another server in addition to Google Analytics (how to do that is beyond the scope of this article, but you can read more about it here).
As a plugin is loaded into a GA tracker instance, it’s possible to have multiple trackers on a page, but only use the plugin on one. This can be useful if you want to keep the accuracy of historical data when you add new tracking code that will change how metrics are measured (like our adjusted bounce rate). Now we can set up a separate tracking profile and run the scroll tracking and adjusted bounce rate on the new profile. Or perhaps we could configure the plugin differently for each instance.
Using GA Plugins
This example, originally from the Google Analytics developer documentation, shows a basic implementation of a plugin using the “require” command.
One of the great things about building plugins this way is it gives a simple and consistent place to configure your plugins. The second argument for the “require” command can be a configuration object that your plugin can use to change internal settings. For example:
Note: the ga function is used by analytics.js to call internal analytics functions indirectly, so that function calls can be cached until the library is fully loaded. The first argument in the ga function is always the function or method name. This is helpful to keep in mind both when writing plugins and debugging your code:
There’s also an interesting hidden feature within the plugin API that lets GA load your plugin files for you. Now, this is an undocumented feature, so there’s no guarantee that it will remain in place, but it seems as though it’s being used internally by GA’s official plugins, and is reported when using analytics.js in debug mode.
Instead of passing in an object to the second argument after the method name, we can put a string with the path address to the plugin file. The configuration object then moves to the 3rd argument (fourth position).
This is great if you’ve had your plugins in separate files to begin with, but if you’ve combined them with the rest of your page’s scripts then it may actually make more sense to keep them there to consolidate HTTP requests for the sake of load time.
Remember, the plugin API is built for asynchronous loading. The plugin file and analytics.js can load in any order and the plugin will still work. However, it’s important that the tracking snippet itself comes first. The snippet does two things — it loads the anlaytics.js file asynchronously, and it creates a placeholder ga function that caches all calls, to be processed when analytics.js is loaded. If the snippet isn’t in place first (or at least the placeholder function), then you may call the ga function before it exists.
You can read more about how tracking plugins are constructed in the GA documentation, or jump to Building a Scroll Tracking Plugin with the Plugin API to see how we use it in our scroll tracking plugin.
So far I’ve been using code that assumes the user is using a modern browser, like Chrome. Unfortunately, however, as anyone who’s ever done any web development knows, this isn’t always the case. And even though modern browser usage is much more common now than in the past, when it comes to analytics, we should make sure our tracking scripts work across all browsers. Progressive enhancement (that is, adding more advanced features for browsers that support them) is great for interactivity, but for analytics code, we want it to work for everyone. And while this used to mean simply ensuring things work in Internet Explorer, it has since expanded to include a variety of mobile browsers as well, which have opened up a whole new batch of cross-browser compatibility issues. Yay.
Page and Scroll Dimensions
Scroll tracking is built on accurate measurements of the document and viewport height, as well as the scroll height to the top of the viewport. Unfortunately, older browsers support and measure these parameters a little differently. So instead of using our window.pageYOffset + window.innerHeight and document.body.scrollHeight parameters, we’re going to replace those with functions that will always return the right measurement regardless of the browser. They look like this:
Now, to be honest, I’ve been using this code for so long I can’t remember where I originally got it from. The functions are pretty simple though: the depth function finds the first parameter for vertical scroll depth and viewport height that returns “truthy”, meaning it exists. For pageHeight, it returns the highest number it finds from a list of possible parameters.
Event listeners are functions that we “subscribe” to a particular event happening — like a click or a scroll. While all modern browsers use the same .addEventListener method, older versions of Internet Explorer use a different method.
Old IE usage isn’t much of an issue now, but I originally developed this code when it was, so I keep it in just in case. This is a universal event listener that uses a closure design pattern. If you’ve never heard of closures, they can take awhile to wrap your head around, but they’re awesomely powerful once you do. Basically, a closure is a function that returns another function.
Cross-browser event listener code:
Here are some basic examples:
Before Unload Event
When we discussed the beforeunload event earlier in the post I mentioned that there was some inconsistent behavior on certain browsers. In this case, old browsers aren’t the issue — the real problem is mobile. Mobile browsers don’t use the unload event in the same way as desktop because mobile browsers keep pages in a “background state” when you move to another tab or app.
Fortunately, there’s another way to detect people switching away — it’s called the Page Visibility API. Here’s a great blog post on how to use it.
The real trick is only using this method on mobile — as beforeunload still works great on desktop, and detecting devices can be tricky. While browsers do pass a user agent that identifies them, these can be a bit of a nightmare to keep track of — plus, they’re always changing. Usually it’s not a good idea to check for specific browsers or devices, but to instead check for features. The trouble is, these browsers do support beforeunload — just not how we want them to. So that leaves user agents.
There are a lot of browser detection libraries out there, but we’re trying to avoid dependencies. What we need is a “best guess” solution that may not be 100% accurate, but will work most of the time.
Fortunately, I have a great data set to use! For awhile I’ve been recording the exact user agent as a custom dimension in serps.com’s analytics — so we have a list of thousands of user agents with Google Analytics’s best guess as to whether they’re mobile, desktop, or tablet.
With that, I developed a RegEx that’s 98–99% accurate based on GA’s classification.
Pretty simple, right? Again, this isn’t a great way to detect devices, but 99% accuracy is pretty good — and certainly good enough for an analytics plugin. Further, by also checking for support of the Page Visibility API, and only using one or the other, we should have most browsers pretty well covered with only minor variability.
One downside, however, is that if people come back to the page after switching away, we could end up sending multiple hits at multiple scroll depths — so we add a “hit count” variable that tracks the number of times the scroll hit has been sent, and if it’s more than once, we append the number in brackets after the event action — e.g. Pageview End (2).
Fixing Landing Page (not set)
In three years of using this code, there’s one particular issue that has been plaguing me: Google Analytics showing a bunch of events that have the landing page as “(not set)”.
The landing page: (not set) issue is caused by an event being sent to GA without a preceding page view. I knew this error was a result of the scroll tracking plugin, but I was convinced it was either loading incorrectly, or preventing the page view from being sent — but in all my attempts I could never resolve it… until now!
In testing the plugin for this blog post I, again, quickly saw the (not set) issue — but recently, Google Analytics added the User Explorer tab, which lets you look at a single user’s interaction with the site. By looking at all of the hits associated with a single user that had a session without a landing page, the cause was suddenly infuriatingly obvious. Because the script fires on beforeunload (or similar), if a user leaves a tab open for longer than the maximum session time (default is 30 minutes), and then subsequently closes the tab, a new session is created by the event — a session without a page view.
Resolving this is actually quite easy. All we have to do is stop the script from sending a hit after a certain period of time has elapsed (30 minutes). We already have a timestamp generated on load for the interaction timeout, so now we just need to add a small check in the final onUnload event that checks the time elapsed since the plugin loaded, and aborts the operation if it’s over the session limit — what I’ve called “maximum time on page”.
This is a solution to the (not set) problem, but it still means we won’t be getting data on users who don’t close their tabs. For that, I added another function that will check the time every 5 seconds, and if it’s within 30 seconds of the maximum time on page, it will fire the scroll depth event with the scroll depth tracked thus far. This automatic event is configured to only fire once.
When navigator.sendBeacon isn’t available
Our last issue for browser compatibility is for old browsers that don’t have the sendBeacon method, which lets you send analytics hits at the end of a page view without it being cancelled by the closing page. Now that Google Analytics automatically uses the best transport method based on the browser, we don’t have to worry about handling that ourselves — but if we want to make sure that our end-of-page view hits are sent on old browsers, we’re going to have to do something a little bit… evil.
As we discussed earlier, the beforeunload event can be used to prompt the user to confirm that they want to leave (for instance, if they have unsaved work), but we can also use it to hijack the page for a small amount of time, keeping it from closing, so that the analytics hit has enough time to send.
To do this, we can’t use setTimeout, because the page will close without waiting for the timer. Instead, we create a while loop that will run until either a maximum allow hold time is passed, or the hit completes (fortunately, GA makes this part easy with its hitCallback parameter).
This code will only run if the sendBeacon method isn’t available. The code loops, setting run to the number of seconds since start until either it’s been longer than the timeout, or the skip variable is set to true by the hit callback.
This code is dangerous — if it doesn’t run right, it can hang the user’s page and overwhelm their CPU. I’ve used this code before, and it works, but I definitely recommend using it with caution and testing with an old browser or testing platform if you can.
Building a Scroll Tracking Plugin with the Plugin API
The GA plugin API is pretty sweet, so let’s use it to build our scroll tracking plugin!
We already have the boilerplate functionality for how scroll tracking works, but before we jump into code, let’s think about how we want to configure our plugin. There are the configuration variables we’ve already established: scrollThreshold and timeThreshold, but we want to make this plugin flexible enough that you’ll never have to change the code itself.
Here’s what we want to have in the configuration object:
- Event Parameters: we should be able to change the category and action text (we’re actually going to use label for something else).
- Sample Rate: this is the frequency that we check whether the user has scrolled — we were using 100ms, but if page performance is slow we can make this time longer.
- Scroll Threshold and Time Threshold: the configuration variables we’ve already discussed.
- Set Page: this lets us toggle on and off setting the “page” parameter to location.pathname. We could also make this where we could set a different page parameter (for instance, once with the query string).
Scroll vs No Scroll
Our adjusted bounce rate does allow for someone to open the page, spend some time reading it without scrolling, leave, and still have their scroll-depth measured. This depth will be the height of their viewport as a percentage of the document height. This information is still useful, as it tells how much of the page the user could have viewed, but, ideally, we would also be able to differentiate between users who actually scrolled versus those who are just on the page for awhile.
This is where the eventLabel comes in. Based on whether the user actually scrolls or not, we can set a label to “Did Scroll” or “Did Not Scroll”, which will allow us to differentiate between these two behaviors if we want. And because we’re using the avg. event value as our metric, we can see an average that isn’t weighed down by users who didn’t scroll.
Ok, let’s get to the main event (ha… analytics pun!).
Here’s all the code that makes up the plugin. We use different files and Node-style require statements to organize our code, but the full compiled library is available in the implementation section.
The coders reading this might notice a few additional features that we didn’t directly cover in this article. One is that we wait to initiate the plugin until the “domReady” event has fired. This is the same as wrapping your jQuery code in jQuery(document).ready();. This is because some pages will have code that modifies the height of the page before the page is done loading, and waiting until everything is loaded can help us get a more accurate measurement.
Custom Scroll Metric
Another feature is the option to set a custom metric slot to the scroll position. This can be useful for pulling data into different kinds of reports, though it requires some processing within GA (for example, setting up a calculated metric) — otherwise, by default, the metric will be the sum of all of the percentages.
The custom metric is also useful because it’s set using the “set” method as the user scrolls, so any other events that occur will contain it as well — so, for instance, if you have a form that pops up for email signup, looking at the scroll custom metric as a secondary dimension to your form event would show you the average position where people use the form.
Implementing the Final Plugin
Step 1: Download from GitHub
Step 2: Upload Plugin
You’ll want to grab the scroll-depth-tracker.js file and place it somewhere publicly accessible on your web server. Take note of the file URL. For example, if you add this to your WordPress theme, it would be something like
Go to that URL and check that the file is accessible.
Step 3: Modify your tracking code
Implementing the scroll tracking plugin only requires one line of code if you’re using the default settings. In our example I’ve shown an empty configuration object to demonstrate how you could pass custom settings to the plugin, but feel free to omit the final parameter and just use it with the same settings as I do.
I should point out that I’m intentionally loading the plugin after the page view event has been sent. I don’t actually need to do this — I could load the plugin prior to the page view, and analytics.js would wait until my plugin is loaded before doing anything else. But because our scroll tracking plugin doesn’t affect the page view hit at all, implementing it this way makes it less likely that any errors will prevent the page view hit from being sent, which I’ve seen happen in the past.
Also, as our plugin waits until DOM Ready, if a user leaves very quickly, the page view might not have time to fire at all.
My hope had been to have the plugin up with a full set of unit tests, but unfortunately I’ve run into a snag in testing this plugin’s functions using Selenium (I’m new to automated testing!). I’ll update this post and the GitHub repo when those tests are added in.
2018 Note: Ha! This was very optimistic!
Google Tag Manager
One other todo item on this plugin is integrating it in with Google Tag Manager. Unfortunately, GTM doesn’t easily support Google Analytics Plugins right now. I have used this plugin in tag manager, but it involved writing custom tags instead of using the built-in GA ones. We may do a follow-up on using Google Analytics plugins in Tag Manager.
Help us make this plugin better! Your feedback and suggestions would be very much appreciated. Comment on this post or create an issue on the Github repo.