The tyranny of testing over design

You sometimes hear people celebrate continuous testing as being essential and tremendously effective, but meanwhile, we seem to forget about the good old principles of design. Continuous experimentation can sometimes hold back not just design, but also common sense.

Experiments at Airbnb… what’s the point?

On May 2014, on Airbnb’s techy blog an article appeared called Experiments at Airbnb, where they provide useful examples of how to run split testing, alias A/B testing. They make controlled experiments — they say, that are very important in shaping the user experience on their site. Their argument is not convincing, though. For example, the post illustrates two tested variations in regard to the price filter:

A feature that was rejected.

Two variations that were split tested.

Let’s now look at what has been implemented on the current site, and try to book a place for 2 nights:

As you can see, not only is the currency not there, but even worse, the label on the price filter does not communicate whether the price is per night or per number of nights booked. That would be a legitimate question, wouldn’t it? The obscure label “Price Range” certainly doesn’t help. You’ll have to select one of the properties to find out that the price displayed in the filter is per night. One might argue that during some (badly run?) usability testing nobody raised this issue. Or that most people would understand the meaning, or that it’s still better than calling it “price”, but why can’t they just put a clear label? That’s what they do on, or in a different way, on Way to stay:

That would be a bullet-proof way of making it work for everybody, without the need for any A/B testing or usability testing whatsoever.

The point here is, the price filter on Airbnb contains a fundamental design flaw, as a very established and quite obvious design principle is to label things properly in order to avoid misunderstandings.

This specific example clearly shows how the outcome of the split test can get invalidated by the fact that the design solution does not meet the design heuristics in the first place.

As a side note, there was no need for A/B testing to come up with very obvious conclusions such as:

  1. Why would people ever prefer a generic quantitative indicator (the dollar signs repeated over and over) instead of actual figures?
  2. There is no point in showing the highest price on the slider, a plus sign is enough to let people know that the top prices are higher than the displayed value, as soon as the algorithm behind is accurate enough to include only a range of prices that is statistically significant.

Could testing turns into a hurdle to design?

The second example described in the post is about the fully revamped interface that was released in July 2014:

The new design is a neat improvement, as users can now see images of the properties without loading a new page, and see the location of the properties on an interactive map. It took a long time to get there, but it looks nice. Was there any need to carry out A/B testing to confirm which design was better? Certainly it cannot hurt, there is a problem, though. Let’s say that the A/B test results pointed to a drop in KPIs on the redesigned version, what would the next steps be? There are so many differences between the new and the old version that identifying the culprit would be utterly impossible. This reminds me of another similar example. In his 2012 presentation called Design for continuous experimentation, Dan McKinley, engineer at Etsy, shares the story of how continuous testing lead the design team to abandon certain design proposals, such as opening an item on a new tab or adopting endless pagination. The problem is that unlike usability testing, which should also be treated with great caution, an A/B testing does not give much insight into what exactly made users prefer one version over the other. Maybe the idea was good, but the implementation was not?

We observe a trend where quantitative methods take over design, claiming an authority that they should not have. Are we dropping design thinking in favour of the cult of statistics?

In my experience as a designer performing plenty of usability testing sessions, I can often foresee what the outcome of the testing will be. Of course testing always provides useful insights on underrated issues, but you ned to know what the heuristic and principles are in the first place, and never lose sight of them. In many of the design-related discussions that I’ve been involved in over the years, these principles were not even taken into consideration.

Let’s go back to Airbnb. Here is the search widget on the homepage, after the major redesign they carried out recently:

On Firefox (Mac OS X)
On Chrome (Mac OS X)

Not sure why they decided to cram all controls together like that, but let’s not focus on the small details and move on to the search results page. Here is where things get worse. This is how it looks like on my high-res monitor, after clicking on “More filters”:

It certainly doesn’t look great.

  • Under ‘Room type’, the three check boxes are too far from the text and icons they apply to, up to the point of being almost in the middle between one item and the one next to it. Is it because everything has to be flat now? Apple did it?
  • When moving the slider on the price filter, the maximum amount updates on the right hand side instead of following the pin (where I would expect it to be), and the maximum price is not visible anymore — small details, maybe, but still…
  • Large amounts of white space between filter labels and filter values. A boldface could be used as a better way to differentiate between headers and values.
  • The ‘Show listings’ button is so massive that it doesn’t even look like a button, it actually scares me a bit.

Here is another screenshot that includes the map, taken during a different session on a different day.

  • The buttons to zoom in/zoom out on the map are microscopic.
  • No currency displayed on price range, and again, unclear “Price range” label.
  • The ‘Options’ section features a really flashy indent overseeing a variety of picturesque font colours, and post-atomic, genetically-modified check boxes.

A small and cute help overlay opens when you click on the small question mark, but the pointer is displaced quite a bit from the icon:

By the way, you might have noticed that the currency was displayed in one of the two versions and not in the other. I wonder if this is because they were playing the multi-variance game to decide if designers should include the currency?

Price filter showing currency (and displacement of pin and number).

Despite the fact that they have done a very nice job with rebranding and emotional design, there are so many issues that could have been addressed by just paying attention to detail, as you would expect from a site with millions of visitors.

Finally, we come to the serious bit, and this quite annoying. It’s hard to believe, but there is no way to sort the results. There’s good reasons to suspect that this has to do with reasons that we don’t even want to think of, because from a design perspective, it doesn’t make any sense. Thousands if not millions of users must have been swearing out loud.

The old style pagination is one more burden to explore results and get what you want. Despite the fact that there’s plenty of space available, at least on my screen, if I want to jump to page number 5 I can’t, but hey, I can jump directly to page 56! And why would I ever do that, considering the listing seems to be totally random, with prices shuffling up and down without a criteria? Are they trying to prevent users from making informed decisions, and see which listings are the cheapest?

Conclusions, and a few notes

Testing methods are invaluable and should be used whenever possible, but not thoughtlessly. Testing should support design, not suppress it. Designers should have authority to take decisions about how an interface should work and what it should look like, as long as they stick to consolidated principles. Testing should be really solid, before it can challenge design decisions. The fact that there may be disagreement in the team doesn’t mean that any available number should be the deciding factor. Designers are supposed to have enough experience and knowledge to make the right choices. Usability metrics, multi-variance testing and usability testing should all be adopted with great care. Continuous testing can be useful in many ways, but it should not replace informed design decisions.

Follow Luca Benazzi on twitter:


Airbnb is a very innovative company compared to the average and they successfully devoted a large amount of time tweaking seamless transitions and smooth interactions. The way they redesigned their ‘List your place’ section, as a single page application, is brilliant. At a quick look, the page describing properties seems very well-considered, too. Unfortunately, while the interaction is neat and accurate, the same can’t be said of the information architecture, wording, front-end implementation, and visual design. Maybe they should A/B test their internal processes?


Some possible reasons why endless pagination did not work at Etsy have been explored on an interesting post called Why did infinite scroll fail at Etsy? My personal opinion is that standard pagination is just lazy design based on a page model that does not have any reason to exist nowadays, and companies have not yet come up with enough valid alternatives; for example, incremental scrolling can be triggered by a mouse click and still offer the advantages of standard pagination, while paving the way to a new paradigm. But that’s a long discussion.


Usability testing poses similar issues that were not described in this article, especially when carried out by inexperienced testers. And in several situations I could see it in action: the tyranny of usability testing against design.


I mentioned Google and Apple in my post. It’s surprising how many people still believe that these companies design impeccable user friendly interfaces.

Follow me on Facebook or Twitter.