Why Your Marketing Attribution Data Is Wrong

John Pauler
Learning Data
7 min readMay 25, 2023

--

For Marketers or Analytics pros who work with attribution data and marketing channels, you’re probably used to attribution software.

The problem? Marketing attribution software in general is pretty weak, and leaves you with major holes in your data.

What’s wrong with traditional click-based attribution software?

Typical click attribution tends to over-attribute to sources like direct type-and organic and paid branded search.

The reality is, these inbound channels only represent the entry point.

What you really care about is the marketing activity that happened somewhere else, which got your customer to come to your site via type-in or a branded search. That underlying marketing activity is the true driver, and that’s what a good marketing team needs to figure out.

Attribution that is exclusively click-based misses this entirely.

Where does click-based attribution software get it wrong?

In general, there are two major problems with click-based attribution software:

  1. Some channels are less likely to have a trackable click
  2. Even in channels with a direct click, the data can be hidden from you

To talk about problem #1, we can point to Refine Labs’ study where they looked at over 600 declared-intent leads and found that the attribution software got it wrong 90% of the time.

They talk about this as the “attribution mirage”, where attribution software over-attributes to easily trackable channels like organic and direct, and completely misses the underlying driver.

In their example shown here, the business was actually driven by social media, podcasting, and word of mouth referral, but the attribution software doesn’t have a clue because those channels don’t have a direct click.

When you invest successfully in those channels, someone gets to know your brand and offering, and then visits your homepage or searches for you on Google. Looking at clicks alone, there’s no way to see what the true driver of the business is.

Refine Labs’ “Attribution Mirage”

SparkToro recently did a great study diving into Problem #2, where they looked at different social channels to see which ones hid referring information.

The experiment they designed was pretty cool itself (read their article!) and the findings were super interesting.

As illustrated below, the channels to the left hid referring information from direct clicks, so the traffic was attributed to ‘Direct’.

So we’ve got these two problems… sometimes there is no click, and sometimes the click’s data gets masked. Yet we’re still really reliant on click-based attribution. Crazy right?

How did we become so reliant on click-based attribution?

Over the past decade or so, we became obsessed with measurement, with very good intentions.

We preached accountability in performance marketing. We needed to show ROI for our marketing dollars.

So we favored channels that were the most easily measurable, where we could show some concrete numbers for without much effort.

The problem is, there are lot of channels that can’t be measured the same way. So if you’re over-reliant on the click-based attribution method, your marketing team won’t be able to make these work.

Where does click-based attribution break down?

If you run TV spots, radio ads, or podcasts, none of these things have trackable clicks. Attribution software will call this direct, organic, and paid branded search.

Same story if you run a marketing program that’s heavy on content creation and building community in places like LinkedIn, TikTok, Twitter, etc.

Folks get to know you in these places, and then when they want your services, they just type your website into their browser or do a Google Search. If you’re looking at clicks alone, all you’ll see is how they came to you. You’ll miss out on the most important info… why they came to you.

How can you fill in the gaps left by click-based attribution?

It’s easier than you think… just ask your customers!

Yup, that’s how we do it. When a new user signs up, we ask them how they found us.

You would be surprised how willing people are to share this information, and how detailed and valuable it can be at times.

We hear people found us on YouTube, or were referred by Annie Nelson (thanks Annie!) or followed our TikTok for months and finally decided to up their data came so they came and signed up.

What do all these people have in common? They came to our site via a direct type-in or a branded search. So relying on click-based attribution would under-report our valuable marketing efforts.

There are some problems associated with self-reported user attribution.

  1. Some people don’t give you useful data. Most do, but you’ll get some people just smashing the keyboard or writing a nasty comment. Small percentages here. But this creates gaps in the data.
  2. Even your well-intentioned data is very dirty. People say “LinkedIn”, or “Linkdn”, or “I follow you on LinkedIn”. You need to clean up this mess and assign a long tail of responses to a clean value of “LinkedIn”.
  3. Self-reported data in general is not perfect. There are lots of reasons someone will give you bad information. Still, it’s useful to ask.
  4. Self-reported Attribution Data lacks granularity. When you buy paid clicks you can track that specific click. You know the ad that drove it, the time of day, the targeting method, etc. You’ve got amazingly granular data. When someone just tells you they came because of “Twitter”, you’ve got no clue which Tweet drove them. You never will. In reality it’s probably lots of different Tweets over time.

How should you handle the dirty data?

You’ll need to automate some logic to scrub the raw results and make it useful in aggregate. All those misspellings and different ways of saying the same thing need to be accounted for.

We handle this with a SQL script. It’s sort of spaghetti code at this point.

Below are parts of our actual code. The full code is of course much longer. This is just to illustrate the concept.

As you do more and more in marketing your logic will get long. Your code won’t perform well. Be smart about how you use it.

I would recommend keeping this cleanup exercise out of analyses that you’ll run over and over and instead do this one-time per user, as part of your data engineering workflow.

How should you handle missing data?

This one is relatively straightforward. We make the assumption that the folks we lack data on look similar to the people we have data on.

Is that a perfect assumption? Of course not. But we think it’s good enough to get us to rough volumes and it’s better than not making the attempt.

So if we know where 80% of our users came from, we make the assumption that the other 20% came from the same channels, and we assign them proportionately (in aggregate, we can’t assign an individual unknown user).

How should you handle the lack of granularity in the data?

Self-reported data for dark social initiatives or TV, podcasts, and radio will never have the granularity you want or that you are used to seeing from paid channels where you are buying and tracking clicks.

The best you’ll be able to say is that a user found you on YouTube. You will almost never know which video sent them to you, so don’t bother trying to figure that out. Instead, make judgments and investments based on user acquisition at the overall channel level (“how many users is YouTube driving?”), and focus on other metrics to understand which specific videos or posts are working within those channels.

To understand the impact of individual posts, videos, and other initiatives, the in-channel metrics we look at are things like reach(impressions/views), engagement/clicks, comments, shares, etc. If a post or video gets a ton of impressions and interactions, it’s good. If we posted something and got an outsized number of new followers that day, we won the day. Pretty simple.

If your content is getting tons of impressions and engagement, you’re on the right track and the business will likely follow. We’ve certainly seen a strong correlation where in-channel metrics boom and then user acquisition follows.

What are your keys to success with attribution?

  1. Use Self-Reported Data: your customers will tell you what drove them to you. It’s super valuable and not hard to implement. Ask them at the point they sign up.
  2. Don’t Ignore Click Data: when you have valuable data from clicks, use it WITH self-reported data. When you combine them, you’re using all your tools to make the best decisions.
  3. Automate. Review Periodically: create a script to clean and analyze the data. Then do manual reviews every once in a while to see where it needs updating. If your “unknown” bucket is creeping up in volume, you may have some new sources you should try to account for that need to be added to your logic.
  4. Embrace The Lack of Granularity: you won’t get click-level details for individual initiatives in dark social. Learn to use directional signals and in-channel metrics to rock your channels, and the customer acquisition will follow.

Wrapping up

No, I don’t have an attribution solution to sell you. Our solution is homegrown, and that’s not the business we’re in.

Just sharing my experience here because attribution is a place I’ve seen a lot of companies go wrong, and where the better solution isn’t too complicated.

Hope you found this helpful. I would love to hear what you think in the comments.

Is this resonating? Or are you getting everything you need from a strictly click-based attribution strategy. Let me know!

--

--

John Pauler
Learning Data

Editor of the Learning Data publication. Lead SQL instructor at Maven Analytics.