The Most Common Threats to Your A/B Test’s Validity
Remember, just because your test reached statistical significance doesn’t mean that it measured what you thought it did. It’s reliable at that point — meaning that if you tested those two pages against each other again, chances are you’d come up with the same result.
That is, of course, unless one of the following validity threats poisoned your data. Here’s what you need to watch out for.
The Instrumentation effect
If you fall victim to the instrumentation effect, it means that somewhere along the line, the tools you used to conduct your test failed you — or, you failed them.
That’s why it’s important not to skip step 6. Have you checked and double-checked that your experiment is set up correctly? Are all your pixels firing? Is your data being passed to your CRM system?
After you’ve confirmed that everything is set up the way it should be, keep a close eye on your tools’ feedback throughout the test. If you see anything that looks out of the ordinary, check to see if other users of your software have had similar problems.
Regression to the mean
Saying that your test “regressed to the mean,” is just a fancy way of saying “the data evened out over time.”
Imagine that you developed a new variation and your first 8 post-click landing page visitors convert on your offer, and the page’s conversion rate is an astonishing 100%. Does that mean that you’ve become the first one to create a perfect post-click landing page?
No. It means that you need to run your test for longer. When you do, you’ll find that your test will regress back to the “mean,” or, within the “average” range after a while.
Keep in mind, this regression to the mean can happen at any time. That’s why it’s important to run your test for as long as possible. Digital marketer, Chase Dumont, found that it occurred six months after he began testing:
At first, the original version outperformed the variable. I was surprised by this, because I thought the variable was better and more tightly written and designed.
And, despite that big early lead in conversions to sales (as evidenced by the big blue spike up there – that’s the original version outperforming the variable), with time the variable eventually caught up and surpassed the original sales page’s numbers.
The longer your test runs, the more accurate it will be.
The novelty effect
This can be a confusing validity threat. Let’s use our button color example again to demonstrate.
Imagine you change your button color to green after 5 years of featuring a blue button on all your post-click landing pages.
When your variation goes live, there’s a chance that your visitors click the new green button not because it’s better, but because it’s novel. It’s new. They’re used to seeing blue but not green, so it stands out in their mind because it’s different than what it used to be.
Combat the novelty effect by targeting first-time visitors who aren’t used to seeing your blue button. If they click it more often than your blue button, then you’ll know that one is better than the other. They’re not used to seeing either color.
The history effect
Remember, factors completely out of your control can affect your A/B test’s validity. Those who fall victim to the history effect are people who don’t keep an eye out for real-world issues that can poison their data. These things include, but aren’t limited to:
- Server outages
- Natural disasters
If you’re running ads on Twitter and the social network goes down, it’ll affect the outcome of your test. If you’re testing a post-click landing page that offers a webinar on a holiday, it probably won’t generate the same number of conversions that a webinar scheduled to take place on a workday will. Keep the history effect in mind when you test your own post-click landing page.
The selection effect
Testers whose data is ruined via the selection effect have accidentally chosen a particular sample that’s not representative of their target audience.
For example, if you’re a B2B software provider and you run ads on a website that’s largely visited by B2C marketers, this will skew your test’s results.
Make sure that when you’re building your test, you target an audience that represents your target customer.