Let’s talk about testing.

Amanda Soehnlen
Jan 18, 2018 · 4 min read

(This is Part 1 of my A/B Testing series. See Part 2 here.)

“A/B Testing is the sign of a good email marketer!”

“ABT — Always Be Testing!”

“If you’re not testing, you’re doing it wrong.”

Here’s the deal:

Testing, as a concept, is good.

Testing, in reality, is something that should be handled with care. Note: This is not to say that you should not test — this is saying that you should put thought into your tests.

Testing: The Basics (AKA let’s go back to 7th Grade Science)

7th-Grade appropriate chart of the Scientific Method is from carsondellosa.com

Question 1: What are you testing? (What’s your purpose?)

This seems somewhat self explanatory, but this is actually the most important (and most bypassed) question.

If your answer is ‘Subject Lines’ or ‘Image Size’, you may be setting yourself up for running a useless test. What you want to test for is concepts. Subject lines (the go-to ‘Always Be Testing’ staple) is a means to figure out something about your audience — not the actual test itself.

An example:

I’m testing the subject line, “Come in to Whizzlebangs! Free 50% Off Coupon!” vs. “Half Price Coupon at Whizzlebangs!”

What is this person actually testing? You could say that it’s the use of numbers instead of words; it could be front-loading the deal; it could be shorter subjects vs. longer subjects.

If they find out that the first subject far outperforms the second, what information do they take from it? Re-use of that subject verbatim won’t work, and there’s no actual way to quantify what was that ‘special sauce’ and why it resonated with the users.

In this example, they decide it’s because of the use of numbers instead of writing ‘Half’ — so every subject they write for the next quarter has a ##% off, and since they’ve stopped testing (because they got their answer, of course), they just assume that they’re running optimized content.

Now, let’s try this following the scientific method:

I want to know what my users think about the usage of numbers vs. words, since currently we use both in our copy.


I want to know the length of content/subject lines my users prefer, since we have no current guidelines.

Both of these are obviously very different purposes, and yet both of them could be tested by the first example.

Following the first ‘statement of purpose’, we chug along through the steps of the scientific method; take an hour, and look up past user preferences, scientific papers, blog posts, whatever works well for your audience. In this case, there’s formal grammatical rules on numbers vs. words, advice for print marketers, tips for using numbers in copy, and usage of numbers in headlines.

After reading all of this and looking at our hypothetical audience, we decide that we think that our users would rather that we use numerical numbers instead of writing them out.

That’s our hypothesis.

After we have our hypothesis, we come up with a testing plan — which needs to be broad enough that it helps eliminate false positives.

  • Have three branches of testing
    - control (copy as written in email body/subject)
    - A (all numerical)
    - B (all written)
  • Test the above multiple (at least 3) times. This eliminates the likelihood of a fluke based on circumstances outside of the test.
  • After you run the test 3 times, have a plan for how you will analyze the data.

Execute on the plan, analyze the results — in this case, we could use email click through since we only tested in the email body & subject.

When you run this test, you’ll come up with one of three conclusions:

  • Your hypothesis is true. Congratulations! Now you should package up your learnings, and share them with your organization / etc., and you’ll have the data to back up why you should have a policy of using numerical digits whenever possible. You’ve also learned something about how your users work, and that they follow the research you found, which should lead to future insights.
  • Your hypothesis is false. Congratulations! Now you should package up your learnings, and share them with your organization / etc., and you’ll have the data to back up why you should look further into how your users interact with numbers, and in the meantime, you should have a policy of writing out numbers whenever possible.
  • Your hypothesis is inconclusive. Congratulations! Now you should package up your learnings, and know with certainty that whether you use numerical digits or write out numbers, it’s largely inconsequential to your users. In the future, if anyone wonders if it does matter, you’ll have the data (and be able to repeat your tests if a significant amount of time has passed) to support that it does not change things.

Compare that to the first example.

They would know that one of them was better, but not why — or have a plan to back up their statement. It makes their foundation of knowledge more precarious, and instead of someone who can be seen as the subject-matter expert, they are actually guessing at the reasoning behind their own test.

If you have any questions, or would like help figuring out what to test, feel free to comment or reach out to me on twitter (@asoehnlen) — This is going to be part of a series. Next week? Proper testing procedures and common pitfalls.

Amanda Soehnlen

Written by

Stuffing marketing into analytics, design, coding, & psychology. Data nerd. Live in PGH, work at Highmark Health. All views expressed are my own.