The engagement trap
To measure how an app or service is doing, we often set our sights on “engagement.” This term is rarely carefully defined, but even when it is the effects of reductionism are already present. People aren’t just clicks, or ‘time spent’. “Engagement” could result from enjoyment, but it can also result, as it often does, from cheap or underhanded manipulation of our baser instincts.
Skinner boxes, mice pushing levers to get cocaine, we have all sorts of ready-made “research” we reference in handwavy fashion in our heads when we talk about topics like engagement, attention, activity. This topic might seem more relevant for purely online services but everything in the software world is affected by this way of thinking.
As I’ve mentioned before, software agents are ‘users’ as well as far as software is concerned. This level of abstraction makes it easier to stop thinking of ‘users’ as people. Bots are also users, which only becomes a problem if someone chooses to care. Maybe mice could be users too. You can get them to click a button all day long any number of ways, as in the image above, for the experiment in which pushing a lever delivers cocaine. The mice will be really “engaged.” The charts are going to look great… for the most part.
Abstractions are useful and undeniably necessary. Overusing them, however, is dangerous. Using them incorrectly, without context, even more so.( This is along the lines of “The only thing worse than no documentation is bad documentation”). It’s something common, talking about “engagement” when what we’re talking about is getting people to spend as much time as possible and click on as many links as possible or post as many things as possible.
The example of using mice may sound like a cheap analogy, but I’m serious. I’m not focusing on whether triggering addictive behaviors is a good idea, or arguing about dopamine rushes or the like. That’s a worthy discussion to have, but I’m focusing on a different aspect.
Like a Turing test for sites: if you can manipulate your stats (in this case, “engagement”) by having small rodents perform a task in a box, then you are doing it wrong.
That is, if you made a button do something on a website, and then had the button and its copies pushed over and over by mice that are being rewarded with food, or sweets, or whatever, and that activity increases your site/application/service engagement, you’re in trouble.
Integers need not apply
Along the line of simple ways to tell when something’s wrong: Positive Integers are also a pretty significant warning sign. If that’s the space in which you’re measuring (happens more often than you’d think) then something is really wrong. One example: you report that user ‘engagement,’ which you define as average number of full page loads per user per month, is 10. Great. But that could be achieved any number of ways: MySpace’s signup page was at one point a crazy number of pages, which was either the result of bad design/engineering or something done to artificially inflate its pageview numbers. So maybe the user signs up and in the process they have to load 10 pages. Then they never return. OR maybe they sign up and then return every 3 days and read a new page. “Engagement” is the same, but the second example shows a qualitatively different result. Oh, but that’s why we measure “user retention” and “return visits”! someone may say. All too frequently, though, these three metrics aren’t cross referenced which again makes them meaningless since the ‘users’ that dominate each area may be different sets. Averages are used without looking at the standard deviation, which makes also them close to meaningless. We separate users in ‘cohorts’ that are different across different stat sets. Since a ‘user’ is at best an account, while we have soft ways of extrapolating when multiple accounts are really one person, we don’t usually look at that. Bots are users too, unless they’re so well known or so high in traffic that you can’t get away with ignoring them.
But there’s more!
When you use an abstraction like “user” it’s also easier to go down the wrong path. Getting a “user” to “discover content” by inserting “targeted paid results” Is much better than to describe how you’re getting your grandmother to unwittingly click on a piece of advertising that looks much like the real content she wanted to look at but says “advertisement” in 5-point font. While you may or may not think (like I do) that this is morally wrong, my larger point is that it is the wrong thing to do for the business too.
You’re essentially depending on people not understanding what they’re doing, or being manipulated, and therefore, you’re kidding yourself. When you start thinking of motivation, you may also realize that as long as you don’t have the equivalent for your company of the card “As a CEO of Trinket software I want to keep shareholders happy by delivering X product with Y revenue and Z profits” you’re just kidding yourself. Making goals explicit, from the small and tactical to the large and strategic, is critical.
This is frequently why even the best companies sometimes take years to refine how they analyze their business, massive amounts of work to patch together a set of abstractions that start to reflect what the business is really like.
What’s the alternative?
“No credit for predicting rain,” is always present in my mind. Ben is talking about some specific situations, and he is not saying that you always have to know the answer before you can ask the question or criticize/point out flaws. I have, however, adopted this mode of thinking when I’m going after something specific or whenever I’m questioning anything that has been longstanding. I always try to come up with alternatives, even I can’t lay out the alternative in detail right here, I can point in a direction. If I’m saying that X is wrong I generally try to have at least one alternative in mind.
So in this case, among other things, I’m calling bullshit on the vast majority of simple metrics used with abandon like “user engagement,” “time spent,” “user retention,” “churn.” These measures require careful definition, proper parameters and boundaries, and accurate correlation to high level goals. They require cross-referencing. They should always be labeled “handle with care: numbers in stats may be more meaningless than they appear.”
So, what, then, is a possible alternative? What is a better way? For example, while they may measure pageviews or time spent, what Amazon really cares about is sales of products. Typical metrics may be looked at in the service of that (e.g. we just did a release and average pageviews are down with high standard deviation, did we screw up the release in some geographic region?). I’m sure that if they could retain the level of sales by serving a single page a day in some magical way, they’d take it.
In being able to clearly point at product sales Amazon is in the minority, but my argument is that every product and service has something equivalent, even if it is less tangible and harder to define, it can be defined and quantified in one or more ways.
If you are a communications app, you may want to know if people really ‘use’ your app. But you shouldn’t care about the number of messages sent. This invents causality where there is none. Just because a message was sent doesn’t mean it was read, it doesn’t mean it was, um, communication. Even if read and replied to, what qualifies are the type of “use” you envision? 5 messages per thread? 4? 100? Over what timeframe?
Is this harder to model and measure? You bet. But it’s necessary, and letting go of abstractions helps.
When you think of people and not users it’s easier to see why pageviews, clicks, “time spent” and many other types of metrics commonly discussed are pretty much smoke and mirrors. Most of us already know this, and we keep using them not because we think they’re great but because they’re readily accessible and our over-reliance of abstractions lets us get away with it.
Suppose the goal of your service is enabling group communication. You shouldn’t care about the number of messages sent, something frequently touted in press releases. This invents causality where there is none.
Regardless of number of messages, or pageviews, or clicks, or signups or any of this other stuff that is more readily measurable, what really matters is whether people are communicating or not, right?
So can say that, for this service, ‘engagement’ = ‘frequent group communication’. A definition of ‘person engagement’ (which would be different from ‘group engagement’) in this context could be a combination of a) frequency of participation in threads with a group of at least 2 other people (meaning, for example, correlated sequences of reply-response of at least 5 messages that involve at least 3 people including the originator) and b) frequency of thread generation, ie start a conversation that call out to others and then tapers off. If you’re looking for growth-related metrics you could look at things like frequency of invitation of others to join that then actually results in someone creating an account. This could be further enhanced by measuring whether the conversation more closely matches real communication patterns, like recurrent use of the names of the other people involved in the group, variance in vocabulary between participants, and many others.
Again, people not users: they don’t just “click” or “post”, they have intention, motivation. They may be showing off, they may be trying to share their happiness or grief. They may be just shallow and obsessed and repeating whatever they think is popular. They may be trying to help. And so on. And in their motivation and intent lies a key element that will either help the product or hurt it.
One person just blindly posts photos of celebrities, for two hours a day, and they have 5 “followers” and two “friends”. Another person has just two “friend” and no “followers” and sends just one joke everyday to the two friends, one of which is in the hospital and they exchange a couple of LULz and check in briefly. When you think of “users”, the first person could easily look “better” on paper than the other. But when you think of people, you realize that the second person and their friend are the ones that are really engaging in something meaningful enabled by your service. At least for me, the first case (which would rank higher in nearly any of the common metrics) would not be nearly as important and valuable as the second group. The second group will be more loyal, their interactions with the product more meaningful and lasting, and they will be better champions for your product.
Thinking what can be measured that would tell you the true strength of what you’re building is at least as hard, if not harder, than building the thing itself.
These more meaningful metrics must also enable us to focus on what matters. Do you want to help group 1 or group 2? They have different costs associated, and different growth dynamics. Common reductionist abstractions would either not give you enough information, or mislead you.
And that’s something we should all want to avoid. 🙂
(revised & expanded form an earlier post in my weblog)