A simple example of how A/B testing can help creators reach more readers

Let’s introduce a change and use a (fun!) statistic to understand whether your audience likes it more or not

Vladimir Kalmykov
The Startup
Published in
6 min readNov 28, 2023

--

Context

Imagine we try to boost our blog or a newsletter. We do “posts” every week, with around 20,000 passive readers and about 3,000 who actually open our emails.

So, you woke up and had a fantastic idea — you now want to use a new style for your post titles: you now aim to be more intimate and personal in the description (whatever it means for your audience). And here is the question: is it a good move for your business? Will readers open posts more after the change?

Of course, you can say: “I want it in a new way. Period.” Well, in this case, it could be fine, but the bigger the audience you have — the harder it becomes to “assume” for everyone that what you like, they like also. Some people prefer precision, some — emotion, but you have no idea who wants what.

The challenge

Can we have at least some confidence that this change is not harmful to our small business? Can we learn from IT companies? IT giants are giants because they follow the scientific approach when making decisions.

For this case, we assume that success is when people open it, so they find it interesting. We can later do more complex cases like optimizing how many people read until the end or do an action at the end (like subscribe, get a free trial, etc.), but let’s start simple.

Basics of A/B testing: complex theory in a fun way

First, I need to explain to you a bit about statistical hypotheses. No worries, it is easier than you think.

Imagine you enter the bus and there are many very tall people. You ignored the bus plate (because you were staring at your smartphone, he-he). There are two options:

  • It was a regular city bus, and it just happened that some people were tall
  • Or a special one for basketball players and their crew (you know this week is a World Basketball Championship in your town).

Question: without talking to other people, can you figure out whether you are in a “normal” or “basketball” bus? Let’s use science!

First, we know the height distribution of all citizens in our country (we can Google it on our smartphone). Then we see the height of all the people on the bus (let’s assume we can measure them more or less using just eyes).

And now — magic! The science of statistics allows us to compare distributions of heights and decide whether or not we observe something special (a basketball player bus) or just a city bus with a few tall people in it). Long story short, we will put all we know (height distribution we googled and people’s heights in the bus) into the boring formula and state something like: “With a probability of p=X, this distribution is not different from the normal one.”

This is a compelling statement! For example, it can be that p=2%, meaning that there is only a 2% chance that the people you see are from a “normal” bus; that is — only a 2% chance that the presence of tall ones amongst them is pure coincidence. So we do accept our hypothesis — they are a basketball team indeed!

Note that there is a 2% chance that we can be wrong —and we can not do anything about it. Practically, it means that if we enter different buses 50 times on this day, there will be a high chance that at some point, our math will tell us, “This is the basketball team,” but this is not. That’s life; we need to deal with it.

Let’s now imagine the formula said p=66%. It means that it is quite likely (66%!) that the distribution we see is normal, so we have to reject our hypothesis about the basketball team. Nope, this is just a bus, and it happens to have tall people! For example, here in the Netherlands, this is quite a common situation :)

How do we choose the threshold that defines whether it is “small enough” to state whether our idea is right or wrong? It depends on the domain. For instance, in medicine, for placebo vs. new pill tests, they use a 1% or 5% threshold (because you don’t want a pill that doesn't work, right?).

For e-commerce and other non-critical cases, 10% is good enough. Yes, we will make a mistake every tenth time, but 9 out of 10, we will move in the right direction. And this is why IT giants are giants.

OK, smarty pants, what about my newsletter?

Now that you know the basics, I can reveal the magic of A/B testing. The idea is simple: you split your population into two halves: 50% in A see the old experience (old style title) and 50% in B — new one (new style title). Assume 10000 who will see the old style title, and 10000 — the more personal one (email content is the same, just titles are different!). You must do the A/B split randomly — no assumptions, no logic; otherwise, you mess up the science.

Okay, we sent the emails with an old title style to people from set A and sent the same emails with the new titles to people from B. Let’s count email openings: assume in A it was 1401 and in B — 1459. So, is the new title better? It looks like it, right? 1459 is definitely bigger than 1401!

Well, science disagrees. If you put it into the A/B test calculator (you can just google the free one, like this — the link also contains the numbers we need), you can see that the p-value is actually 0.24, so there is a 24% chance that the new experience (new title) is not better than the old one, but because of the randomness it occurred to be a bit higher. Sorry, the quarter is too high a chance of a mistake. We must decline our hypothesis and state that the new title style is not better.

Imagine that A stays the same, but instead of 1401 in B, you see 1498 (new link). Graph became green — and the p-value is now 0.05, meaning that there is only a 5% chance that our change is just randomness. It looks like we actually improved email opens with our change, and science agrees with this. Cool, right?

You can try a few posts with old/new titles, and if all of them prove your hypothesis — congrats, you just used A/B testing to make a tiny step closer to your audience!

Conclusion

I hope I have accomplished my goal of explaining a complex conception with simple examples. Of course, the email/article title is just a toy case, but if you want me to decompose another one for you — ask in the comments, and we explore other examples.

It is essential to mention that A/B testing is not a hammer for everything. If you want to say something in your own unique way, or you want your website to be green and blinking — go for it. It is your brand identity. Some things must be non-negotiable. But for the rest of the ideas you want to experiment with to understand whether they fit your diverse audience, you might consider getting some help from the science.

Finally, I omitted quite a few crucial steps to make the magic work consistently: how to properly set the hypothesis confidence level, how to choose the right metric, how to decide on the test, and how to iterate — check practical assignments here (free). Again, if I see your interest in the likes/comments of this post, I can continue with more examples and potential pitfalls here.

--

--