Experimentation at Carsales

William Jiang
carsales-dev
Published in
7 min readJan 18, 2021

carsales.com LTD is Australia’s largest online classifieds that connects buyers and sellers of vehicles. We have millions of customers using our network of sites and tools to help them make more informed decisions. As a result, we have many data points on how customers use our site, which can be then used to optimize their onsite experience.

One method of optimization is through the use of AB testing.

What is AB Testing?

AB Testing is a way of putting two or more variations of an experience against each other to determine if one is better. This involves segmenting a portion of your audience to experiment with and dividing them into control and test groupings. We also need to measure what success is — metrics such as click-through rate, transactions completed, and pageviews are a great start.

Much like you would in a science experiment! — see Figure 1.

Then we’d run the experiment for an estimated amount of time and see if there are any winners.

Figure 1 — AB Testing

So you get 1% vs. 1.1% in your test, how do you know if that’s enough to determine a winner? The winner is determined through statistical significance — which shows how confident the result is likely to be attributed to the change and not just random (which real world data tends to be). If there’s no statistical significance we cannot say that the change had any effect on metrics.

(For the mathematically inclined) This phenomena of, more data=more confidence in result, is due to the Central Limit Theorem. It dictates that the greater number of samples in your dataset the lower your sample mean variability — and will follow a normal distribution. You have a more precise confidence range for the true population mean. This roughly translates to the figure below. To best tell that there’s a difference between the two, you’d want to have the least amount of variability.

Figure 2 — To tell if blue is different from green, you’d want less variability and thus more data.

Chances are, if you’ve used anything on the internet whether it’s Google/YouTube/Facebook/Netflix — you’ve probably been part of an experiment which has been used to improve KPI’s and customer satisfaction.

Such companies benefit from having huge traffic as it allows them to reach statistical significance quicker on their AB Tests.

Why do we experiment?

We’ve found AB testing is a much more rigorous way of measuring the success of a feature release. It’s not just one person saying “my way or the highway” and not having the data to prove it. Conducting these tests give us real data and an opportunity to put an idea to the test. We can then discuss and debate about why or why it’s not working. The numbers don’t lie.

It also allows us to trial more ideas without the risk of providing a worse experience to 100% of our customer base.

All of this provides us with a systematic process of incrementally improving on customer experience.

The alternative to AB Testing is doing Pre vs. Post analysis which is fraught with assumptions, to the point where the data becomes unusable for decision making. Take a look at Figure 3(shout out to Airbnb’s post for the graphic). How do you know if the increase is due to the product launch or due to environmental impacts — like time/seasonality/other feature rollouts? You won’t be able to do proper attribution without AB Testing.

Figure 3— Why Pre vs. Post analysis can have it’s flaws.

Where do we experiment?

Anywhere! Well, where we feel we can provide the most value to customers. We’ve done tests on using different recommendation algorithms, UX/UI, Call to Actions.

A great example of our testing was with the Carsales iOS App. We wanted to know if the use of a filter bar design would help customers get to their desired stock easier. After running the experiment we found that it hindered users ability to get to their desired stock and thus was a worse experience over the standard search.

We’re now beginning to iterate on this design by taking the elements of the design that did work well whilst keeping the original refinements functionality which our customers are used to.

Figure 4— Filter bar Design AB Test

Another test was when we tested onboarding new users to using the saved-search button —which allows you to save a configured search you have and receive notifications about it. New users often don’t know what the Carsales app can do, so guiding them to certain bits of functionality can help the experience dramatically.

We found that it led to a whopping +80% increase in engagement on that feature. We have since deployed this feature.

Figure 5— Saved Search AB Test

How do we experiment?

At Carsales we use a tool called Optimizely Full Stack which gives us the ability to AB test all platforms and devices. It enables us to test, deploy and teardown experiments quickly — as well as do some fancy targeting with segments and do dynamic CTA changes on the fly without doing code redeploys.

If your company is serious about doing testing you should spend time building out an scalable AB testing tool or get an off the shelf tool like Optimizely. I can attest that manually sifting through data in source databases — filtering outliers and calculating statistical significance is not the most fun thing to do after you’ve done it numerous times. Getting a tool means freeing up time to craft more precise experiments.

Tightly integrating AB Testing as part of development and operations will allow for your company to scale up testing efforts.

Embedding a culture of experimentation

Whilst it’s nice having the fancy tools and understanding the statistical importance of testing, it’s nothing without quality ideas flowing in abundance. A culture of experimentation is important so that team members and leaders can understand why we test things.

This usually starts with education. Setting up workshops to help team members understand the basics of hypothesis testing has been helpful boosting overall data literacy. It encourages people to think of product in terms of key metrics and impact on the business. Showcasing successful tests can also help inspire other teams to think of innovative ways to change up things and have those ideas tested. Also having a forum for people to submit ideas is crucial.

Another piece of the puzzle is making experimentation data accessible and having a place to log these results. This is so that people can look back and learn from what did/didn’t work. We send out weekly experimentation status updates and we also log each experiment into a logbook. The logbook contains the results of all our previous experiments as well as how it attributes to revenue gained/saved. It’s important to have estimates for revenue as it helps your business understand the value of experimentation. Hard to argue with dollars and cents!

Experimentation is like going to the gym

Getting our experimentation process right wasn’t an overnight success. It took months of education and working with the developers to get the tools setup correctly — as well as getting buy in from the business and working out which metrics to track.

What helped us was doing many small tests that were quick wins for the business. This flexed our AB testing muscle and ensured that we were doing the experimentation with proper form. It gave us confidence in the results and that helped us go for the more complex tests. This was the methodology that took us from doing very little testing to now tripling the amount of experiments we run every month and also the amount of AB Testing wins we get.

Also, don’t expect every test to yield a statistically significant result. Most of the time your tests will yield no improvement, especially the more mature your product is.

For every experiment that succeeds, nearly 10 don’t — and in the eyes of many organizations that emphasize efficiency, predictability, and “winning,” those failures are wasteful. — HBR’s Article on Experimentation

However, there are times where one test can yield a result which can improve metrics drastically and those are the experiments that might not have even been implemented without trialing a whole bunch of failed ideas.

So trust in the process, keep on testing!

--

--