How Hypothesis Testing Killed the “Best Practise”

Free your mind — challenge product “best practices”

I’ve always felt a little uncomfortable when people use terms like “best practises” or “industry standards”, and this post is about a real world example of why everyone should be cautious of such claims.

One of the tools that are often mentioned in the context of growth hacking is life-cycle emails (aka “drip emails”). These are usually employed for user on-boarding and aim to raise the awareness of the user for your product and potentially let him in to features he might not know yet. Life-cycle messages are mentioned in Dave McClure’s popular “Startup Metrics for Pirates” presentation which you should read if you haven’t already.

Here’s for example how the life-cycle emails are setup for ft.com:

Recently we decided to try out these for InfoQ and although at the beginning the plan was to just launch them, at the end we decided to approach this as an experiment and see if they were truly valuable for our case.

The experiment was setup in the following way:

  • We (*) altered the registration process, so that half of the new registrants would receive life-cycle emails after their registration.
  • The other half, would not receive any (null hypothesis).
  • The system would inject a Google Analytics (GA) custom variable in both cases, so we could later on segment any reports and see differences in usage, retention, etc.

During the time the experiment run we had around 10k registrants participating the experiment and randomly being distributed between the two buckets. The hypothesis bucket got a total of 5+1 email messages during the period of several weeks.

The result

Although the users that got the life-cycle messages generated some additional traffic from the links they were clicking inside those messages, overall the total amount of traffic and sessions were (statistically) equivalent.

Note that Group A ended up having slightly more users than Group B, which could account for the 1.4% difference.

In simple terms, even if we launched this feature for all our users it wouldn’t make any difference to our bottom line.

If the numbers were even just a little better we’d probably try to optimise and try out different message formats, but they were not. And I guess this is inline with my personal experience with the messages from ft.com — I got them but I still won’t visit their site unless someone tweets about them.

So are “best practises” wrong?

Of course not! But are worth nothing without context.

In 2009 I was lucky to attend Stefan Tilkov’s “Thoughts on the Generic vs. Specific Tradeoff” presentation at QCon London. In a room packet with alpha geeks Stefan compared XML vs HTML, SOAP vs REST, etc outlining the advantages and disadvantages of each solution, showing that there is no certain answer to an architect’s quest without context.

One of the phrases Stefan used a lot during that presentation was “it depends”.

Is X better than Y? Is practise Z a best practise? Well, “it depends”.

At best one can find indications of “smart” practices that promise solutions that may or may not work for a given situation.

Stay skeptical, stay doubtful

In your product’s lifetime — especially if it’s a mature one — there will be very few things that will move the needle significantly: you might launch a mobile site that increases the number of mobile users, you might stumble upon a single killer features that your users will be crazy about, or even one of your competitors shutting down might send you lots of potential new users. But unlike what some growth hackers bloggers are suggesting, most of the times your online product will grow slowly because the growth initiatives you take have a moderate impact. Considering this and also how many different ideas you might have it’s essential to be disciplined in testing your hypothesis and only pursuing what works for you.

Specifically you should:

  • Not take “best practices” for granted and test if they work for your audience.
  • Evaluate your alternatives with as much data as possible.
  • Your “gut feeling” is good. Data is better!
  • Be ruthless with the features you build; if they’re not of real value, put them out of their misery — fast.

Further reading


(*) Kudos to Anca from our marketing dept for coordinating, and Mircea from our dev team for the implementation of the experiment.

Disclaimer: The views expressed on this site are my own and do not necessarily reflect those of my employer.