Experiments at Scale

Published in

Product & Engineering at Tophatter

6 min readSep 22, 2017

At Tophatter, we’re launching a lot of experiments. That means we need to understand the impact of our changes on our business metrics. Consequently, we have invested a lot of time and energy into building tools and processes that augment our decision making with the power of statistics. This allows us to say, with a high degree confidence, whether our experiments are having intended or adverse effects.

But how do we decide what to build in the first place?

As it turns out, it is possible to apply the same data driven approach to choose what features we want to experiment with. In fact, Qubit has already done the legwork for us and published a white paper analyzing the results of hundreds of experiments across multiple e-commerce platforms.

What’s old is new again

According to the author’s analysis, features emphasizing Social Proof, Scarcity and Urgency are most likely to have a positive impact:

Social Proof: treatments that leverage the behaviour of other users to provide information about trending products and currently popular items.
Scarcity: treatments that highlight items that are low in stock, almost always by using ‘stock pointers’.
Urgency: treatments that use a time limit to promote urgency to complete an action before a deadline, almost always implemented using a countdown timer.

To understand the experiment I’ll describe in a bit, it helps to know our history. I invite you to join me on this delightful walk down memory lane.

Old Tophatter desktop experience circa 2012. What happens when commerce meets turntable.fm?

Tophatter desktop experience in 2014 . This was the *Immersive Experiment* where we changed the auction room to use the full browser viewport. This experiment was a #greatsuccess #verynice.

Tophatter desktop experience in 2015. This was our live utility auction experiment. We launched this and hoped for the best as there was significant risk of alienating our existing user base. In the end, this experiment lifted several key metrics. #keygains

Current Tophatter desktop experience. We came up with the idea in a hackathon to display all auctions on a single page. This was the “God Mode” experiment. You retain a game-like feel, but you can see everything that’s happening across all auctions. This is arguably our most successful experiment launched.

In Tophatter’s early days, we had a feature called Bid-O-Meter (Ragnar’s magnum opus!) that utilized Social Proof by displaying avatars from linked social media accounts alongside the auction. This helps to replicate a live, in-person auction experience with bidders responding to social cues from others around them.

As you can see, the platform has evolved significantly since then, from an avatar and dedicated auction room experience, to one emphasizing discovery by displaying all the items currently up for auction on the main landing page.

So the UX for the old Bid-O-Meter needed to be refreshed, but given that it was successful in the past and related to Social Proof, we felt it was a good candidate to revive.

Old Bid-O-Meter circa 2013. Ragnar and Jared built v1 in a few days. #hacktheplanet

Redesigned Bid-O-Meter today. I built this one. It took me longer than a few days. #scale

Tech Stack

Tophatter clients come in 3 flavors:

A mobile-first responsive web client written in Coffeescript / Javascript and Bootstrap.
A native iOS client written in Objective-C / Swift
A native Android client written in Java / Kotlin

Our strength is in Ruby and Javascript (Android and iOS devs — we’re hiring!). Combining this with the longer release cycles for native apps due to App Store review, we often find it more convenient to build and test features on the web first and implement them on the native clients after they are proven to be successful.

Running the experiment

We have developed an in-house framework for tracking users across all the different experiments we may be running at any given time.

Administrative UI for Tophatter’s custom experiment framework

An experiment consists of multiple treatments (including a control group), each of which is associated with a specific experimental behavior.

As users are exposed to new features in the app, they are assigned one of the various treatments based on a pre-determined percentage.

With users appropriately tagged, we can run database queries to compare key metrics between the various treatment groups.

For example:

View Rate — How likely is a user to examine an item’s details?
Bid Count — How many times do they bid?
Bid Rate — How likely are they to bid?
Win Rate — How likely are they to win?
Pay Rate — How likely are they to pay?

As the paper suggested, the new Bid-O-Meter had a definite impact on buyer behavior. Specifically, we saw a large and statistically significant (p < 0.05) increase in total Bid Count for all users:

Interestingly, while the feature drove the total Bid Count way up, it had a negative impact on the overall Bid Rate.

User Anecdotes

Along with the aggregate experiment data, we collected feedback from individual users with the help of our friends at UserTesting. We make our decisions based on the aggregated data, but individual user feedback does help guide our interpretation of the results.

Some sample feedback from real users:

* It was clear to me that these circular items were, but without an explanation of how the site works, I would be very hesitant to bid on an item.
* Not confusing, but also not persuasive
* It seems like they max out at 4 icons, even if the item has more than 4 bids — so that part was a little confusing to me.

Conclusions

Our interpretation of the data is that the feature does drive more engagement (i.e. bidding) for users that do decide to bid, but it reduces the likelihood of any specific user from bidding in the first place.

The user feedback also suggests some users didn’t actually understand what the feature was trying to convey and that there may be some low hanging fruit in our on-boarding flow to more clearly explain how the site works.

In the end, the feature was a bit too polarizing. The trade-off between converting fewer but more engaged users versus converting more users overall was not one we wanted to make, so we turned off the experiment and it put it back on the shelf for future consideration. For example, by displaying the Bid-O-Meter to users only after they bid on an item, we may be able to reap some of the gains in total Bid Count without sacrificing the initial Bid Rate.

We’ve got tons of experiments in the pipeline and are always looking for the next big thing to move the needle on our key metrics. If this sounds interesting, take a look at our careers page.

About Tophatter

Tophatter Inc was founded by Ashvin Kumar (CEO) and Chris Estreich (CTO), and launched in January 2012. The company has raised $35M to date from Goodwater Capital, CRV, and August Capital. The company has 75 employees globally, and is actively hiring at its offices in Silicon Valley and Shanghai. For more information about Tophatter, please visit: http://www.tophatter.com/about.