Analytic Thinking Leads to Powerful Product Decisions
This is part of a series of posts put out by my firm, KOM Partners, where we educate the C-suite, product leaders, and control investors about finding the intersection of data plus instincts to become “distinct” leaders.
Now that we are a few years removed from this tactic, I want to share a story from my time at Amazon. In my role, I was responsible for Product Management for all Kindle Reader software. This means my team was doing most of the hard work (i.e. designing features) and I was providing the leadership and air cover for what we wanted to do. It also meant that I had to deal with the inevitable Jeff Bezos “?” email.
If you are not familiar with these things, I would encourage you to read Brad Stone’s excellent book “The Everything Store.” In it, he has the following description for these mails:
When Amazon employees get a Bezos question mark e-mail, they react as though they’ve discovered a ticking bomb. They’ve typically got a few hours to solve whatever issue the CEO has flagged and prepare a thorough explanation for how it occurred, a response that will be reviewed by a succession of managers before the answer is presented to Bezos himself. Such escalations, as these e-mails are known, are Bezos’s way of ensuring that the customer’s voice is constantly heard inside the company.
I have been on the receiving end of multiple of these emails, and they are never fun. If you are a senior leader, you know that each must be dealt with urgently, thoroughly, and in a customer -centric way.
The problem on which I want to focus today arose during an earnings conference call in February 2013 in which the Barnes & Noble CEO made the following statement:
“…our mobile Apps for IOS and Android which are the highest rated reading Apps in both iTunes and Google Play.”
This once sentence stopped my workflow for a few days. The “?” arrived.
The very first thing we needed to do was verify the claim. Based on data pulled from the iOS store, we found the following:
- Kindle for iOS (v3.6.2) — 294 ratings, 2.6 stars
- Nook for iOS — 6,846 ratings, 4.5 stars
What Was Going On?
From my point of view, the data did not seem right. The source of confusion for us was that the Nook app traditionally had ranked behind Kindle for iOS for position in the App Store. At the time (algorithms change all the time, and I no longer follow the intricacies of the iOS App Store ranking algos) rankings were driven by factors which included downloads and user ratings. Nook had so many ratings, and so many positive ratings, that it was difficult to match these two realities.
Get Some Data
Obviously we needed to dive a bit deeper. There were some services available at the time to pull data from the App Store, but there were limitations which caused us to have to think of alternatives. Ultimately, I decided to write some Python code to access the App Store end point. The purpose of this was to pull down the overall ratings data of multiple apps, as well as pull down the text reviews for any ratings that had user submitted reviews. At the time, Pandas was not in my toolbox, but Excel is a champ and a favorite go-to once you have some data.
Whenever you have data, step one is to explore it. It’s always easier starting with data for which you already have comfort. In this case, we plotted the number of reviews by day for the Kindle app.
What jumps out are the clear peaks for number of daily reviews. This was a clear pattern, which we verified by looking at the dates and cross-referencing our release dates.
To further verify this pattern, we pulled the data for another competitor, the iBooks app.
To ensure that we were complete in our analysis, we considered data from multiple app types and categories. We pulled review data from a varied set of popular and lesser apps. App after app exhibited mostly this same pattern of having peaks in review count by day, which was most likely tied to releases. In general there was a peak around the initial release.
Beyond that, the review count would taper down to a much lower steady state, and then a new peak would appear, likely due to a new version. To further verify this, we used pivot tables to look at the date of a peak and the date of the first piece of written feedback associated with a given release. This generally mapped to a peak in the daily review count chart for that app.
When a Pattern Doesn’t Match
Using this pattern of peaks around release dates, the chart for the first half of the time period for Nook looked normal. There was a sharp peak in Jul/Aug 2011, which mapped to their release of v3.0. There was a total of 100 reviews for that version, and a peak of 20, with a steady state review count was 1.9.
The next uptick was in Apr 2012, when Nook released v3.0.1 (143 total reviews), 3.1.1 (111 total reviews), and 3.1.2 (80 total reviews). All of this matched the observed pattern.
Until v3.1.3 in Mar 2012. During the first 14 days, which is when we would have expected the review count by day to be the highest, the Nook app registered an average of 8.8 reviews. However, over the life of this release, the average review count jumped to 15.2, twice nudging above 30 reviews per day, each time was well after the initial release window.
We wondered if this was tied to feature set, which included updates to support Retina display, easier highlighting, and 2 col support for the iPad. Maybe the Retina feature could support rolling thunder reviews, but unlikely.
By way of comparison, the Kindle for iOS release which supported Retina display (v 3.0.1) garnered just 845 reviews in the 82 days it was available, with a peak of 37, and an average of 8.7 reviews per day. Remember, given how important App Store rankings were for discovery, the fact that Kindle was ranked above Nook made this confusing.
Logic Busting
Up to this point, we thought there were credible aberrations, but nothing that stood out. Release v3.1.4 in May 2012 shattered our logic tree by garnering a total of 2,715 reviews. For the 114 days for which customers could download that version, July 8th had the highest number of reviews (45), and the app also demonstrated an increased rate of review count starting around the 7th day post release.
For those reading this far who already know the punchline, in 2012 there were no examples of popular apps showing this sort of pattern for adding ratings. None.
The highest total review count for any version of Kindle for iOS was 845. No single version of iBooks had more than 1,610 reviews. Both apps, one of them first party to the platform OS, were ranked above Nook.
Without a frame of reference for what could be happening, we had to start testing other questions.
What Patterns Do We See in the Review Text?
The first pattern we investigated was the word count per review over time.
In looking at these numbers, we felt comfortable that depending on the customer and app, there was a general baseline for expectations around how many words a review would receive on average.
Do you see what happened there? Right around the time that the review counts started going up by day, the average word count took a dive.
So the review count went way up, and the average words per review reached a new, lower, steady state.
Did the Quality of the App Increase?
People who are happy tend to complain less. We found that the average word count of the 5 star reviews dramatically decreased in line with the v3.1.3 release. The number of 5 star reviews also went way up.
Despite having never ranked higher than Kindle or iBooks, the Nook app had gathered more total reviews, as well as 5 star reviews, than both Kindle and iBooks. Looking at the data for all three apps from the date of the v3.1.3 release onward, Nook had pulled almost as many total reviews as Kindle and iBooks combined.
With this lens, we had only one hypothesis: either their app had become so good in that version onward that they had raving fans, or there was something else going on.
Know Your Competition
With this data in mind, I asked everyone on my team to download the Nook app and use it everyday as their primary reader. We would reconvene in two weeks time to see if the quality was high and we had somehow lost track of their progress.
The response to the Jeff mail included a form of the above with many more charts and tables of data. It wasn’t sufficient to have him wait for two weeks, so we gave him what we had, along with our hypothesis, and told him we would follow up.
Two weeks later, we had our answer.
Keep It Simple
I will be the first to admit I was wrong, and that my hypothesis was that Nook had engaged in astro-turfing their reviews either with a hired company, or via their employees.
One of the hardest parts of working at companies like Amazon, Microsoft, Google, Facebook, etc is that you are surrounded by incredibly smart people. Clever people tend to come up with clever ideas. Clever people also come up with overly complicated ideas. Really clever people come up with simple ideas.
Nook wasn’t paying a firm, nor were they asking their employees to rate the app. They asked their customers.
Within the two week review period, all of us using the Nook app with high regularity noted that at some point we saw a pop-up asking us to rate the app. There it was.
More importantly, Nook understood better than we did that your best customers are likely your early customers, and that they had already rated the app. Most of those customers probably did not know that they could rate every version of the app. Over time, with popularity, you get less and less engaged fans of the app, and more and more laggards who are using the app for utility. They also bitch quite a lot more. These net new customers are not your raving fans. The customers who use the app many times in a week are.
The initial irregular pattern of review count maps perfectly to July 4th, traditionally a reading heavy long weekend. The uptick trend we noted starting at 7 days post that release would also map to a solid base of consistent users tripping the Ask for Review feature.
If you ask customers who have opened your app more than 3 times to rate it, these are your engaged customers, and potentially your biggest fans. We don’t know what the Nook trigger was, but this was clearly the smoking gun.
Invent and Simplify
Amazon has 14 leadership principles. One of the core principles for a product manager is Invent and Simplify. In Amazon’s words:
Leaders expect and require innovation and invention from their teams and always find ways to simplify. They are externally aware, look for new ideas from everywhere, and are not limited by “not invented here”. As we do new things, we accept that we may be misunderstood for long periods of time.
With the insight that asking customers for a review was a friction free way to supercharge the review count, and with the insight that you shouldn’t ask all of them, just some of them, we looked for a way to leverage this knowledge and build a feature which would increase our ratings in an honest way-by asking our biggest fans to continue to support us in a low commitment way. Higher reviews improved store rankings. We set out to improve on what we saw with Nook.
When Product Can Act As Marketing
Circa 2012, if you had asked anyone who has had to improve app installs, they will tell you how hard it was to spend marketing dollars in a trackable way, with good results. Even for the Kindle marketing team, generating app installs was not easy, and they had a tough job.
The Kindle team was spending millions of dollars on marketing to generate installs of the iOS and Android app. Happy customers buy books, and we were in the business of selling books.
We set out to design a feature that would be very customer friendly, but generate high review counts, from our most engaged customers. We created a list of requirements, which included:
- Customers would only be asked for the review if they used the app a certain number of times within a certain number of days
- Customers would only be asked for the review when they were in the library, and not if the app opened into reading mode
- Customers would only be asked for the review once for a version, regardless of how many iOS devices they were using
- Customers would not be asked for reviews again if they tapped “No Thanks”
These all made sense, and were easy to implement. However, we took it one step further. All application developers should be logging activity. If you aren’t, you should be. This information helps you understand who your best customers are.
Since we had a service which communicated with the app while open for things like sending downloads, uploading notes/highlights, handling rights management, etc, we created a call that would allow us to poke any app in the field to enter into the state which would show this “Ask for Rating” dialog on the next app start.
The analytics at HQ allowed us to segment our customer list. We ran tests by sending out small batches of pokes to figure out which classes of customers were a) most likely to tap “Yes”, and b) rate us highly. We simply pulled review data in the week following the batch-ask to determine if we trended up or down from the baseline review scores.
By doing this one thing, we were able to generate app store velocity numbers which clearly confused the App Store algorithms, because we started getting more free promotion from within the App Store. Our download rate from new customers started increasing. And increasing. And increasing. Our marketing team jokingly referred to this feature as the “million dollar feature” because of our ability to generate a torrent of positive reviews any time we wanted, which drove spikes in our net new user counts.
To sum up — when you are presented with a problem, figure out if you can gain access to or generate data which will help you investigate the claim, as well as interrogate theories as to what is happening.
In the case of product managers, it’s OK not to focus too much on your competitors (ALWAYS focus on your customers), but when you see something out of band, it’s time to get smart on their current version, and marketing tactics. It’s also OK to borrow good ideas. Facebook has demonstrated this over and over again as of late, liberally borrowing from Snapchat.
If you are going to borrow, put your own twist on it and make it yours and in-line with your company and values.
Lastly, always keep in your mind that great product, and reviews from happy customers, can greatly increase your sales. If you can figure out a way to harness those customers from a feature in your app, I would encourage you to explore that. Social proof is a key part of the customer value journey. Asking your best customers for reviews is not slimey. It makes sense, when done with constraints.