Is being too data-driven dangerous?
The data-driven mindset has become a necessity in modern companies because of the amount of data generated by digitalization and because of the amazing success taped by data-driven pioneers (GAFAs). People everywhere started embracing a data driven mindset frantically (including ManoMano). We are questioning in this article to what extent could an excessive data-driven approach turn out to be toxic at the end. Our point is to show that relying exclusively on data might not help your business go to the next level because data are usually more complex to use than anticipated (quality, biases…) and will never capture our world’s complexity. But combined wisely with more traditional approaches (qualitative research, business sense…) it can be a powerful tool to inform decision making and provide a precious feedback loop.
Risk 1: your data might be wrong (more often than you think)
In data-driven companies, decisions are based on data rather than opinion. But how many times did you realize a posteriori that the data you based your decision upon turned out to be wrong? Wrong data are worse than opinions… And most companies focus way more often on analyzing their data than cleaning them.
We are no better than others at this at ManoMano. Let’s have a look at our bidding algorithms: we spend a great deal of energy optimizing our bidding algorithms because marketing is our biggest variable costs. Data scientists did great improvements (according to an ex Googler now working at ManoMano, ManoMano was said to be the only company whose bidding algorithms were better than their own algorithms). But when we migrated our website to symfony in 2017, the conversion pixel was slightly modified and was not providing the right information anymore, fooling our (still always great) bidding algorithm. Garbage in, garbage out… It took us weeks before realizing it! We had no Data Quality check for this yet very sensitive data.
Strengthen your efforts on data quality by setting up a data governance policy from the very beginning (data definition and ownership, indicators calculation, golden source of trust, data quality tools, developers education since they are responsible for feeding the datalayer) and hiring enough data engineers to properly clean your data before storing it. Set up the right tool to monitor the data quality. We are looking for solutions like Seenaptic (data layer quality tool), Looker (allowing to refactor your KPIs calculation methodology and propagating changes to all related indicators), etc…
Analyzing wrong data won’t bring any value in, so clean your data first.
Risk 2: your indicators might not be the right ones
OKRs have become very trendy since Google made them popular (even if they were born at Intel 40 years ago). I do think they bring value, especially by enabling individuals to focus on a limited range of 3 to 4 topics. But they also give an excessive importance to reaching Key Results (indicators) at any cost. It really becomes a problem if your indicator is the wrong one, which again happens more often than you think. You might then spend your energy optimizing a non relevant metric during 3 to 6 months…
I will illustrate my case with a last exemple taken from ManoMano. Each Feature Team (FT) has its North Star Metric (NSM). The “Relevance” team’s NSM is the “share of business volume going through the search”. We had a lot of discussion about this KPI. In my opinion, it is not the right KPI. First because you could improve this KPI without solving any of our user’s problems. For instance, we could easily double the size of the search bar on our site. It would for sure attract more requests through the bar. Too many factors may impact the use of the search on our website. An argument in favor of this KPI was that if our site was more relevant, people would use more the search and the KPI would go up. How long would it take ? We don’t know. So not very motivating for the team. Besides, the link between what the team would ship in terms of features and the impact on the KPI would be very loose. If you go back to the problems we are trying to solve (doing User Research), we noticed that when people were using our search bar and not finding what they were looking for, they would open a new tab and type the same query on Google. Best case scenario, they would go back on ManoMano after clicking on a Google ad (costing us money) or worst case scenario they would go to a competitor website. Using our analytics tool, we were able to quantify the first behavior. After a first cycle of improvement around our search (mainly reranking), share of BV going through search had stayed flat while this metric was divided by two! So looking at the NSM, the team would have failed. Nevertheless they were very successful because they managed to drastically limit the “relevance leak” on Google !
Spend time selecting indicators related to the problem you are trying to solve and use them as a feedback loop rather than your ultimate goal.
Risk 3: your data results might often be biased
Now that you have a good data quality, the correct KPI, should everything be fine? Not yet! You will run experiments, probably through AB testing because you are data driven. What now threatens your decision-making is bias in your interpretation of results.
First, bias can be a human bias. Teams working on a project will always find good reasons not to kill a feature even if the AB test result is negative (and they might be right, see next section!). They will say that the experiment set up was not the right one, that a key feature was missing and we should rerun it, etc etc…
Second bias could be a technical bias. A big French e-commerce player suffered from a bad AB testing set-up. Long story short, population A was exposed beforehand as part of a quality check before the actual AB testing. As a consequence, A population always had better results (10% of today’s visitor are yesterday’s visitor with a CVR two times bigger. This population bias creates the AB bias). And this bias happened for a long time!
Third bias could be methodological. The protocol is not well framed, some effects were not taken into account (what about the repurchase rate if a given feature satisfies users to the point that not only do they convert better but they also come back more often?). Not even to mention basic errors like sample size or test duration.
I will give you a last exemple. I remember one situation at ManoMano where we faced this bias issue for an important business decision. We have dozens of ManoAdivsors, a community that helps our customers to find the right products and get pertaining advice. They are rewarded through commissions. Even if metrics are very positive (conversion rate, average basket…), do we still make a profit once we pay that commission? Knowing that, to what level can we raise the commission for our community? For instance, do we provide advice to customers who would have bought anyway (without advice)? We had to conduct an AB test to assess the uplift value of advice. With an important decision relying on this test: should we keep on developing our community? We went through 3 steps. First one was to use the result of our chat tool AB test. But we realized they were biased and too opaque to be rigorously checked. We then conducted our own AB test. We then realized that the protocol was not rigorous enough (some effects like retention impacting life time value were not taken into account, B population criteria were wrong…). We finally conducted a 3rd AB test with a lot of resources from the data science team to assist. (Results were positive :))
Spend time educating your team on correctly setting tests and analysing results by potentially offering stronger support from data scientists (eg. for most critical analyses). Never forget to keep in mind business context.
Risk 4: data often lead to over locally optimizing
Having data allows you to optimize. It is pretty comfortable as a situation. You may quickly enter a test-and-learn routine. And that is ok, at least at the beginning. But concentrating only on optimizing your current data won’t take your business to the next level. You can even miss some big opportunities. Let’s talk again about AB testing that may quickly become the dictator of local optimization (if you disagree, feel free to comment!).
I will give you a first example taken from a discussion I had with a Product guy from LinkedIn. He told me about the story of their pricing page revamping. This page was complicated and the PM thought he could do way better. After some research, they launched a new pricing page. The AB test was really bad. But they were so convinced that the new page was better that they kept it anyway. And after 6 months, the new page was performing way better than the previous one…
Another example from a French booking website. They migrated their website to a new technology as well as they changed the design. When they released the new website, they experienced a severe drop in their conversion rate. One so severe that you would immediately stop your AB test (once statistical significance reached of course) and rollback to the A version. But there was no A version anymore, so they had to stay with the B version. And guess what? They gently recovered and probably overperformed their former level of conversion.
At ManoMano, most of our traffic goes through the listing page. We made some local optimizations like the add-to-cart button color, wording, place, etc… It led to some improvements. But at some point, we took a step back and went to talk to our users to understand which problem they really experienced from this listing page. DIY is very technical, so our PM Clément Caillol realized that users were going back and forth between listing and product pages to check technical product information. On desktop, users opened several tabs for each product page. He named this behaviour pogo-sticking (from this great Baymard blog post). This behavior, probably not perfect on a desktop, is still much harder to perform on mobile… Considering we have more than half of our traffic on mobile, this was really an issue. He then thought about how a revamped listing could solve this problem with the product designer (Gary Delporte). Along with all the company’s stakeholders (category managers, B2B marketing, sourcing, etc) they came up with this great solution of “technical product card’, providing right from the listing page key technical attributes of the product.
Don’t rely on your existing data set to find breaking patterns but rather on field research.
Risk 5: data are not good for discovery
When data really spread in all the industries, people were probably over optimistic about what they would get from their data, especially with regards to discovery. Big datasets would let new behaviors surface that we were not aware of… But it didn’t. Data without business sense leads to stupid or obvious conclusions most of the time. But data can support traditional discovery techniques (especially User Research) and validate what would otherwise might have remained assumptions, quantify their impact, measure their reach, etc…
We had several examples where qualitative insights from User Research became quantified problems thanks to our data. Users searching on our site, not finding what they want, then going on Google and finally coming back on ManoMano (see risk 2). We also discovered, watching our users, that they did not trust our filters and would not use them because they were afraid of losing pertaining results. Looking only at our data wouldn’t have revealed that precious insights. You can nevertheless use data to find correlations between the percentage of qualified products and filter usage. Another example of how qualitative and quantitative can complement is about ratings: users wanted reviews. Data told us through partial dependencies analyses to what extent reviews impacted our CVR, above which number of reviews their impact vanished, etc, etc. In some cases, data allowed us to perform our qualitative research through text analysis and sentiment analysis of thousands of monthly NPS (Net Promoter Score) verbatim. Dataset had become way too big for manual analysis
Use data as a complement to more qualitative discovery to validate and measure the behaviors you found.
Risk 6: trying to explain everything only using data
Some teams spend an insane amount of time trying to explain:
1/ Absolute value of a key metric of their business (vs. variations)
2/ Discrepancies when more than one source provides the same metric
I think this is useless except in some specific cases (see example below). About the absolute value because the metric in itself does not mean anything and usually no benchmark can be applied since many external factors impact the metric. About discrepancies, because you have to accept that some differences will always exist. Why? because at some point in the data lifecycle, some calculations won’t be executed the same way, while neither of them are wrong.
I will take a very concrete exemple with a controversial metric: the conversion rate (CVR). Most people agree this is a vanity metric. Depending on your acquisition mix (paying vs. natural), your brand awareness, your share of repeat, this metric is very hard to compare between two companies within the same sector. “Apps have a better CVR than mobile web”: for sure, apps are intensively used by existing customers! Even for a given company, this metric highly depends on external events like a marketing campaign, weather (in DIY it impacts sales a lot), etc… So in a perfect world I would recommend not to care about this metric. Yet almost all investors will ask you to justify the absolute value of this metric because it is a standard in the e-commerce industry: why are you below or above your industry standards?
I think they should focus on the variations. Why did it go up? Which improvements did you add to your funnel that increased CVR? If it went down, is it because of the acquisition mix? An increased traffic? Price competitiveness?
We added one layer of complexity when measuring this metric with two different tools: our product analytics tool (Amplitude) that computes by itself a CVR and our data warehouse (DWH) metrics. The two metrics were different of course… Even sometimes the variations were opposite! When digging in, we found out that it was mainly due to the way bots traffic was computed. In other words, it was a data quality issue (see risk 1)
The Checkout team at ManoMano spent numerous hours trying to answer these questions. At the end, here are our learnings. First, when having two sources, name one of the two the golden source of trust. In our case it was the DWH source since we owned the data and the way it was calculated. Then, check in case of huge differences (eg. opposite trends) that you don’t have any data quality issues. We had one related to bots traffic. Finally, focus your energy on explaining the variations of your metrics rather than the absolute value. At the end, one of our most data skilled Product Directors, Emmanuel Hosanski, created a dependency analysis chart that allowed us to explain where major differences came from between two periods. And this was ok for our board as an answer.
Don’t waste your energy trying to explain the absolute value of some metrics, concentrate your efforts on understanding their variations.
Risk 7: your data will never capture the whole business complexity
If every decision could be made up upon data, our companies would be ruled by algorithms (roadmaps, budgeting…). Unfortunately, all attempts I witnessed to base business decisions only on data were a failure. For instance product roadmapping: if you try to set up an objective data based process, you will probably end up adding more and more criteria columns (first cost and estimated ROI, then strategic priority, then level of dependencies, then company OKR proximity, etc…). At the end you will have to compute a score. How? By setting arbitrary weights for each criteria…
Even in our data rich world, one has to accept that everything can’t be reduced to a set of data and that business sense matters (a more polite way to say “guts feeling”).
Data should be one part of the decision criteria, not the only one. Adopt a “data informed” attitude rather than a data-driven one (this term that I find very relevant is taken from this great article, thanks Bryce Tichit for sharing it when re-reading this post, I was not aware of it!).
Key takeaways to make the most of your data without being infatuated by them:
- Be tough on data quality by setting strong data governance and staffing data team
- Select the KPI that measures the highest priority problem you are trying to solve
- Empower your team thru maths methodologies and cognitive bias to analyse results
- Resist to the temptation of over optimizing existing solutions to explore new ones
- Focus on qualitative methods to discover radically new insights
- Concentrate on the variations of your key metrics not on the absolute value
- Resort to business sense to take decision, use data to inform them
Thanks to Clément Caillol, Etienne Desbrières, Yohan Grember, Jeremie Jakubowicz and Sergio Rodriguez for their precious feedbacks!