Let’s say you manage a product that helps small businesses support their customers. You’re looking to improve customers’ engagement and retention levels. There are two ideas are on the table:
- A dashboard to allow the business owner monitor engagement statistics and trends.
- A chatbot to help the business owner automate communication with their customers.
The dashboard idea came up a few times in talks with customers and you feel it has good potential, but there’s also a risk that only power users will use it. The chatbot is something that the entire company likes and management is quite bullish about — it feels like an big win for customers, it’s a cool project, and, yeah, chatbots are all the rage now.
Which one would you choose?
Such prioritization questions are at the heart of product management. The penalty for choosing wrong can be quite high — cost of development + cost of deployment + cost of maintenance + opportunity cost + other residual costs. We are often tempted to make a decision based on weak signals: majority votes, highest-paid person opinion (Hippo), industry trends, etc, but those have been shown time and again to be bad heuristics that are not any better than putting chips on a roulette table (hence the term “Big Bet”).
In this post I’ll demonstrate what I consider to be the best way to find winning ideas. It consists of three parts:
- ICE Scores
- Confidence levels (with a free tool!)
- Incremental validation
ICE score is a prioritization method invented by Sean Ellis, famous for helping grow companies such as DropBox and Eventbrite and for coining the term Growth Hacking. ICE scores were originally intended to prioritize growth experiments, but can also be used for regular project ideas.
You calculate the score, per idea, this way:
ICE score = Impact * Confidence * Ease
- Impact is an estimate of how much the idea will positively affect the key metric you’re trying to improve.
- Ease (of implementation) is an estimation of how much effort and resources will be required to implement this idea. This is typically the inverse of effort (person-week) — lower effort means higher ease.
- Confidence indicates how sure we are about Impact, and to some degree also about ease of implementation. I’ve written a whole article to explain why prioritization by impact and effort is not enough. The short version is that we’re very bad at estimating both and we’re blissfully unaware of this. Confidence values are the antidote — they keep us honest about our assumptions.
The three values are rated on a relative scale of 1–10 so not to over-weigh any of them. Each team can choose what 1–10 means, as long the rating stays consistent. Ultimately the goal is to have your idea bank look like this:
Let’s use the example to see this at work.
- Lean Product management — Barcelona — July 17: In this workshop I will walk you through the principles and tools of lean product management — creating strategies and business models, setting goals, prioritizing idea, execution and validation using GIST.
- Product Value and Breakout Growth — Barcelona Sep 30: In this unique workshop, Sean Ellis, the godfather of growth, and I will show you how companies such as Google and Dropbox keep their products high-value and in constant growth using a combination of Lean Product Management and Growth Hacking principles.
- Other dates and locations (including private workshops): itamargilad.com/workshops
You decide to calculate ICE scores for the two ideas, dashboard and chatbot. At this early stage you use rough values solely based on your own intuition.
- Impact — you’re guessing that the dashboard will increase retention significantly but only for a subset of your customers — you give it 5 out of 10. The chatbot on the other hand can be a gamechanger for a lot of customers so you give it an 8 out of 10 (I’ll try to do a separate post on how to assess impact).
- Ease — You guesstimate the dashboard will take 10 person-weeks to build and the chatbot 20 person-weeks, solely by thinking of similar projects in the past. You will get better estimates from the team later. You use this simple table (chosen by you and your team) to convert your estimate to Ease:
So the dashboard gets an Ease value of 4 out of 10 and the chatbot a 2.
There is only one way to calculate confidence — looking for supporting evidence. For this purpose I created the tool shown below. It lists common types of tests and evidence you may have, and what confidence level they provide. When using it consider: what indicators do we have already, how many of them, and what we need to get next to gain more confidence.
Sidenote: If in your product or industry there are other evidence tests, feel free to create your own version of this tool, just be mindful about what present strong or weak confidence. For more background on confidence scores see this earlier post.
Let’s go back to the the example to see the tool in action.
- Supporting evidence for the chatbot: Self conviction (you think it’s a good idea), Thematic support (the industry thinks it’s a good idea) and Other’s opinion (your managers and coworkers think it’s a good idea). That gives it a total confidence value of 0.1 out of 10, or Near Zero Confidence. The tool clearly doesn’t consider opinions as a reliable indicator. Interesting.
- The dashboard has this going for it: Self conviction (you think it’s a good idea), and Anecdotal support (a handful of customers asked for it). That actually bumps its confidence value to a whopping 0.5 out of 10, which is Low Confidence. Unfortunately customers are bad at predicting their own future behavior.
The ICE scores:
The dashboard looks like the better idea at the moment, but the tool shows you haven’t gone beyond low confidence. There’s simply not enough information to make a decision yet.
Estimations and feasibility checks
Next you meet your counterparts in engineering and UX and together you scope out both ideas. Both projects seem feasible at first look. The engineering lead comes back with rough effort estimates: the dashboard will take 12 man-weeks to launch and the chatbot 16 man-weeks. According to your Ease scale this gives ease scores of 4 and 3 respectively
In parallel you do some back of the envelope calculations. With a closer look the dashboard looks slightly less promising and gets a 3. The chatbot still looks like a solid 8.
Using the confidence tool shows that both ideas now pass the the Estimates & Plans test and gain some confidence. The dashboard now moves to 0.8 and the chatbot 0.4.
The chatbot has closed the gap. Still confidence levels are low and for a good reason — these are mostly numbers pulled out of thin air, and you know you need to collect more evidence.
You send existing customers a survey asking them to pick one of 5 potential new features, including the chatbot and the dashboard. You get back hundreds of responses. The results are very positive for the chatbot — it is the #1 feature in the survey with 38% of respondents picking it. The dashboard comes in 3rd with 17% of the votes.
This gives both features some supporting market data, but the chatbot scores higher at 1.5. The dashboard also gets a confidence boost, but just up to 1.
The chatbot has moved strongly to the lead. Your co-workers and the industry seems to have been proven right. Should you pull the trigger now? Probably not — the project is quite costly and we only have medium-low confidence. Survey results don’t generate a very strong signal unfortunately. Keep working!
To learn more you run a user study with 10 existing customers showing them interactive prototypes of both features. In parallel you conduct phone interviews with 20 survey participants that chose one of the two candidate features.
The research reveals a more nuanced picture:
- 8/10 of study participants found the dashboard useful and said they would use it at least once a week. Their understanding of the feature correlated well with what you have in mind and they had no issues using it. The phone interviews confirmed understanding and desire to use on average once a week.
- 9/10 of study participants said they would use the chatbot. Their level of enthusiasm was very high — everyone immediately saw why this can help them and many asked to have it as soon as possible. However there were tough usability issues, plus some customers voiced concerns about offending their own customers with canned bot responses.
This qualitative research gives you some food for thought. The dashboard seems to be more popular than you expected. The chatbot now sounds more like a high-risk/high-reward project. Looking at the confidence tool you give the dashboard and the chatbot confidence values of 3 and 2.5 respectively. You adjust impact to 6 for the dashboard and 9 for the chatbot. Finally based on the usability study you realize getting chatbot UI right will require more work — you reduce Ease to 2.
The tables have turned yet again and now the dashboard is in the lead. You bring the results to your team and to your managers. Strictly based on ICE scores the dashboard should be declared the winner, on the other hand the confidence scores of both are far from high. Reluctant to let go of a potentially good feature the team decides to keep testing both.
Final tests and a winner!
You decide to start by building a min-viable product (MVP) version of the chatbot — development takes 6 weeks and you launch it to 200 survey respondents that indicated willingness to test. 167 enable the feature, but the usage drops dramatically day by day and by the end of two weeks you have only 24 active users. In follow-up surveys and calls a clear picture emerges — the chatbot is harder to use and far less useful than the participants had expected, and worse it antagonizes their customers who seem to value the personal touch. The feature actually causes the business owners to work harder. Analyzing the results you and the team conclude that launching a useful version of the chatbot that will meet customers’ expectation will require at least 40–50 additional man-weeks (Ease of 1) and has high risk. It’s also clear that far fewer customers will find it useful than first expected. You therefore reduce impact to 2. This changes the feature in fundamental ways so you can no longer trust the results of of the user study to confirm it, so you reduce confidence to 0.5 with the help of the confidence tool.
The dashboard MVP launches within 5 weeks to another 200 customers. The results are very good. 87% of participants use the feature, many of them daily with little drop off. The feedback is overwhelmingly positive, mostly asking for more. You realize the impact is higher than you expected — an 8. The engineering team estimates it will take another 10 weeks to launch the dashboard in full, so Ease of 4. According to the confidence tool you feel comfortable setting the confidence value to 6.5 out of 10.
At this point the prioritization is very easy indeed. No one disputes that the dashboard is the right feature to pursue next. You keep the chatbot in your idea bank in order to record the finds, but it naturally gets sorted to the bottom given its low ICE score.
1. Stop investing in bad ideas
This example illustrates how risky it is to bet on a high-effort features based on gut-feelings, opinions, themes, market trends, etc. Most ideas are more like the chatbot than they are like the dashboard — they underdeliver on impact and cost much more than we think. The only real way to find winning ideas is to put them to the test and reduce the level of uncertainty.
2. Worry about outcomes, not output
This may seem like a laborious and slow way to build products, but it’s actually much more efficient than the alternatives. Not only does confidence testing eliminate most of the wasted effort spent on bad ideas, it also focuses the team on short and tangible learning milestones with immediate measurable results, which improves focus and velocity. Through the process we learn a lot about the product, users, market, and end up with better end-product that has already been tested by users. We are therefore rarely surprised at launch day and need to make far fewer fixes post-launch.
3. Let a thousand flowers bloom
In reality we often need to choose not between two ideas, but between dozens. By limiting the effort we put into each idea based on our level of confidence in it, we allow ourselves to test many ideas in parallel, avoiding the pitfalls of traditional big-bet development— see my post on GIST planning for more on this.
4. Get your managers and stakeholders on board
Here’s what worries people most when I explain this topic — how to get mangers and stakeholders to buy-in? Can we really get them to limit their god-like powers over the product? Well, you’d be surprised. I hear a lot from managers that they prefer not to be the deciders on product matters, but they feel compelled to get involved as the team is presenting them with weak options. What is weak or strong is of course subject to opinion, unless you show up to the review not just with a polished pitch deck, but with real evidence and clear confidence levels. You might be surprised how much easier the conversation gets. On the flip side, the next time your CEO surprises you with a new must-have pet idea, try to show her how her idea is evaluated— how much impact, effort and confidence we can give it, how its ICE scores stacks up against other ideas, and how we can test to gain more confidence. Most reasonable folks will agree that that’s a good way to go about it. If they’re still not convinced, send this blog post to read and asked them to leave a comment (or message me @ItamarGilad), and I promise to fight the good fight on your behalf.