How to Get More Out of Your Observational Data

Sofia Quintero
May 23 · 7 min read

For product managers, getting the right data can give your team a critical edge, but thinking about that data in the right way can be at least as important, especially so with observational data.

The term observational data refers to any information product managers gather without the subjects’ active participation. GPS data from mobile phones and video view rates, for example, are both cases of observational data, while survey responses are not. That means any time your team uses observational data, you will be measuring what people do, not what they think.

This sort of information has its advantages; recording where people choose to spend their time and money can sometimes give product managers a more honest picture of their habits than asking them to self-report. When used carelessly though, it can miss key details of a relationship. Building a solid approach requires the team to understand both what the data can tell them and what limitations it has.

1. Back Up Quantitative Findings with Qualitative Studies

Quantitative data can be effective for figuring out the “what” of a relationship. It can tell product managers how often people are visiting a particular web page, or how many of the people who buy one product also buy another.

What it often will not tell you is the “why”; getting at that part of the question requires a more in-depth analysis. You might, for example, interview customers to get a clear idea of why the new ad they saw motivated them to visit your website so much more frequently.

Adding this element to your research can save you from several pitfalls.

First, your quantitative data might show a correlation where there is no causal relationship, or might mask the real driver. Ice cream sales and shark attacks are positively correlated, but not because eating ice cream makes a person more likely to be attacked by a shark. Rather, people tend to eat ice cream during the summer and also tend to go to the beach during the summer.

Second, your data might show a causal relationship that is pointing in the wrong direction.

It might be the case that your Google ads are persuading people to buy your product, or people who already want to buy your product might be searching for it on Google and clicking on your ads for convenience. If the connection between clicking the ad and buying the product is all your product management team knows, determining which way the causal arrow points is impossible.

In these sorts of situations, different sources of information can complement each other. Your product management team’s large-scale, quantitative data on customers’ habits can temper the biases that might twist their responses to surveys and interviews, while the surveys and interviews can tell you why you are seeing certain trends in your quantitative data.

2. Beware of Measurement Effects

In some cases, the act of gathering the data itself will affect the results your product management team gets. People who know they are being watched might be less likely to buy a product that has a social stigma attached to it, or to express their true preferences. For example, people tend to over-report their income and how often they exercise.

Where possible, your product management team should consider using covert, rather than overt observation, removing any sign that you are recording your subjects’ actions. The less invasive the observer’s presence is, the less likely the subjects are to change their behavior.

Your product management team should also consider using indirect observation, measuring the results of your subjects’ behavior rather than the behavior itself. For instance, examining food waste in a school cafeteria might be a more reliable way of determining students’ eating habits than parking a researcher in the cafeteria during lunch hour.

Of course, sometimes altering your methods to counter measurement effects will not be possible. Ethical concerns might prevent your product management team from gathering data covertly.

Observing your subjects indirectly might not get you the information you need. If you want to know not what the students in that school cafeteria eat, but whom they eat lunch with, analyzing food waste will not help you.

For this reason, it is critical that you “sign” your bias. That is, as you draw your conclusions, identify all the ways your data could be off the mark and determine whether each source of bias is working in favor of your conclusion or against it. Very rarely will your team be able to find entirely unbiased data, but your data will still be valuable if you can make an educated guess about how exactly it might be biased.

3. Start With a Falsifiable Claim

It can be tempting to avoid prejudging your data by trying to begin the process without any expectations for what that data will show.

In fact, this approach can make bias in your interpretation worse.

Instead, your product management team should develop a falsifiable claim about the data before you see it. To say that your prediction should be falsifiable is to say that it should be specific enough that once you have seen the data, you can say definitively whether you were right or wrong.

As you can begin with your prediction and proceed by thinking through all of the ways it might be wrong, starting with a prediction will help your product management team think about the counterfactual. For example, if you begin by positing “our new ad campaign has raised sales by X percent relative to this time last year,” you are setting yourself up to consider where your sales would be if you had not launched the campaign. Sales might be seasonal and may increase over time even if the campaign is ineffective.

Structuring your research this way will help you work systematically through the dozens of relationships that might be present in your data.

Furthermore, gathering data has a cost, be it staff time, licensing for software, or just the goodwill of your customers. If your team begins without a clear idea of what question you want to answer, you could end up spending time and money gathering information that does not tell you something you really need to know.

4. Make Sure Your Model is Not Too Smart For Its Own Good

In general, the simpler your data, the less complex your model needs to be. Where the sample is large and where the relationships of cause and effect are straightforward, linear models will do the trick. When this is not the case, your product management team will need to move to more complicated tools.

Relationships in your data could be complex in many ways.

The outcome you want to measure could have many significant inputs, or inputs that interact with each other. Maybe your product appeals in a unique way to young people living in big cities. If you measure only the effect of age, or only the effect of living in a big city, you might miss the broader point.

The relationship you are analyzing might not be linear. For instance, the change in traffic you get from investing your first dollar in content marketing might or might not be the same as the change you get from investing your one-thousandth dollar. Depending on what your data look like, your team will need to use a different model to understand it.

Of course, your team should not default to the most complex model without taking a look at your data first. Upon considering all the ways that simple models can fail you, it can be tempting to dump as many parameters into your model as possible. Using a model that is too complex for your data, however, can lead to “over-fitting.” That is, it imagines relationships in your sample that turn out not to exist in the wider population.

Every sample will have some unique quirks. Part of the reason for a product manager to use a statistical model is to sift through those quirks and find trends that matter outside the sample. If you build a model that can explain every tiny detail of the sample, the model may assume that the rest of the world is more or less like your sample and wind up leading you astray.

You can run into over-fitting problems even if you have data on an entire population, as your data will only represent that population at a particular point in time, and some of its quirks will not be present before or after the time your data covers.

Data is Not Neutral

With observational data getting ever more plentiful and reliable, it is critical that product managers put systems in place to distinguish signal from noise. The bigger and more complex your data set, the more important those systems are.

At every stage in the process, bear in mind that neither your model nor the data themselves are neutral. They can both be influenced by your team’s assumptions and biases. Only by thinking critically about what questions to ask of your data can you get the answers you really need.


Originally published at https://blog.getenjoyhq.com on May 23, 2019.

Hungry for Insight

A community for product people leading change

Sofia Quintero

Written by

Founder and CEO at https://getenjoyhq.com/

Hungry for Insight

A community for product people leading change

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade