Have you been through this scenario? You set up an A/B test to discover whether your new button and headline combination will generate more conversions…
You drive equal traffic to both your post-click landing pages — the control and variation — and stop after a month, when your software declares your variation the winner with 99% confidence…
You roll out the new “winning” design, but several business cycles later, that 50% boost in conversion rate shows no impact on your bottom line. You’re confused. You’re annoyed…
And you’re probably the victim of a false positive test result.
What is a false positive test result?
Why isn’t that conversion rate boost of 50% translating to more sales? The reason, says Lance Jones of Copyhackers, is because it probably didn’t exist.
It’s entirely possible (even likely) that you do not see the lift in sales or revenue from your test because it was never there in the first place. You may have unknowingly received a “false positive” in your test – known as a Type I statistical error, otherwise known as an incorrect rejection of a true null hypothesis. That’s a mouthful, so I simply remember it as a false positive.
Mouthful or not, these Type 1 statistical errors are more common than you’d think. It’s estimated that around 80% of A-B test results are imaginary.
If you’re making key decisions based on false positives, at best, you’re leaving optimization to chance. At worst, you’re actually worsening the conversion rate of your post-click landing pages.
Luckily, there are some ways to combat poisonous data. One of them is similar to a testing method you’re probably already familiar with…
What is A/A testing?
A/B testing involves driving traffic to two different pages — an original (your control) and another version (your variation) — to see which performs better.
Similarly, A/A testing involves driving traffic to two pages to see which performs better. But unlike in an A/B test, an A/A test pits two identical pages against each other — and instead of discovering a lift, their goal is to find no difference between your control and variation.
Why would you A/A test?
We don’t blame you for scratching your head, wondering “What on Earth would testing two identical pages against each other accomplish?”
It may sound silly, but it’s a technique that some professional testers use to test their A/B test before they test. (Huh?)
Accurate test results require more than statistical significance
Anybody can run an A/B test, but few can run a valid A/B test (remember: Only around 20% of test results are actually legitimate).
Producing accurate test data involves more than reaching statistical significance with a large and representative sample size. To be confident about your results, you have to ensure that sample isn’t tainted by a number of validity threats.
One of those threats, the instrument effect, is what A/A tests are most helpful for combating.
What is the instrument effect?
Protecting against validity threats starts before you even begin A/B testing. The instrument effect, Peep Laja of CXL says, is what poisons most test results:
This is the most common issue. It’s when something happens with the testing tools (or instruments) that cause flawed data in the test. It’s often due to wrong code implementation on the website, and will skew all of the results.
That’s why, when setting up a test, it’s important to make sure your tools are configured correctly and working the way they should. If they’re not, these common issues can arise:
- Misreporting of key performance indicators. Just one error in one tool can jumble your data, which is why you should never rely on a singular platform to track all your test information. At the very least, integrate with Google Analytics to double-check that the metrics you see in your testing software and website tracking are accurate. For even better results, triple-check with another tool. Be suspicious of any reports that don’t match up relatively closely.
- post-click landing page display problems. Small coding mistakes can cause big validity threats, like display issues, during your A/B test. That’s why it’s crucial to make sure your post-click landing pages look the way they’re supposed to across all devices and browsers, and that your visitors aren’t impacted by something called the “flicker effect.” Among others, a slow website can cause this problem, which occurs when your control is momentarily displayed to your visitor just before the variation.
- Stopping a test too early. Some testing software will declare a winning page prematurely — when a sample size isn’t large enough, or representative of your target customer. Remember: Reaching statistical significance does not mean it’s time to stop your test. The longer you run it, the more accurate your results will be.
Any one of these issues (and more) can lead to a false positive at the conclusion of your test, which is why Peep warns testers to be vigilant:
When you set up a test, watch it like a hawk.Observe that every single goal and metric that you track is being recorded. If some metric is not sending data (e.g. add to cart click data), stop the test, find and fix the problem, and start over by resetting the data.
But not everyone feels comfortable immediately jumping into A/B testing with both feet — especially when using a new software. So, as an added precaution, some practitioners A/A test to evaluate their tools before they begin A/B testing.
If your experiment is set up correctly, at the end of an A/A test, both pages should emerge with a similar conversion rate. As the following testers show, though, that doesn’t always happen.
A/A testing examples
Are false positives really that common? Can one page really outperform its clone? These guys used A/A testing to find out and revealed their findings in the following blog posts…
1. Home Page Split Test Reveals Major Shortcoming Of Popular Test Tools
On November 11, 2012, the Copyhackers team began an A/A split test on their homepage, pictured below:
On the 18th — 6 days later — their testing tool declared a winner with 95% confidence. For the sake of accuracy, though, the team decided to let the test run one more day — at which point their software declared a winner at a confidence level of 99.6%:
Their homepage was performing nearly 24% better than the exact same page, and there was only a .4% chance the result was a false positive, according to the software. Still, the team let the test run for about three more days, and the differences eventually evened out:
But that’s not the point. The point is: The testing tool declared a winner too early. If the Copyhackers team hadn’t kept it running, they would’ve incorrectly assumed there was an issue with their experiment. Read more about the test here.
2. A/A Testing: How I Increased Conversions 300% by Doing Absolutely Nothing
This sarcastic title comes from author and self-proclaimed “recovering wantrepreneur,” David Kadavy, who ran a number of A/A tests over 8 months on 750,000 email subscribers. During that time, he generated statistically significant results, like these:
Among those results were:
- A 9% increase in email opens
- A 300% increase in clicks
- A 51% lower unsubscribe rate
To many wantrepreneurs (my former self included), this looks like “oh wow, you increased opens by 10%!” They may even punch it into Visual Website Optimizer’s significance calculator and see that p=.048. “It’s statistically significant!” they (or I) might exclaim.
The truth is, though, these were all A/A tests. The content tested against each other was identical. See more of his results here.
Should you run A/A tests?
The answer to this question depends on who you ask.
Neil Patel, who kept seeing big conversion lifts that didn’t equate to more revenue, says “It’s really important that you run an A/A test first as this will help ensure that you don’t waste time with inaccurate software.”
On the other hand, Peep Laja of CXL says A/A tests themselves are a waste of time. So who’s right?
The two major problems with A/A testing
From a theoretical standpoint, A/A testing makes a lot of sense. Above all, accuracy is most important when running an A/B test, and testing your test is just one of many ways to ensure it.
In real-world testing environments, though, A/A tests have the potential to do more harm than good. Craig Sullivan explains:
For me, the problem is always eating real traffic and test time, by having to preload the test run time with a period of A/A testing. If I’m trying to run 40 tests a month, this will cripple my ability to get stuff live. I’d rather have a half day of QA testing on the experiment than run 2-4 weeks of A/A testing to check it lines up.
That’s problem one. A/A tests cost real time and traffic that you could be using to learn more about your website visitors with A/B tests.
Problem two is exemplified in the case study from Copyhackers. Like A/B tests, A/A tests need to be designed and monitored carefully, because they’re susceptible to false positives too.
In other words, your A/A test might tell you that one page is performing better than the other, when it’s not (that chance is much higher than you think — around 50%)
If the team at Copyhackers had listened to their testing tool and declared a winner just six days in, they would’ve spent even more time trying to figure out why their homepage was performing better than its identical twin (when it really wasn’t).
The major benefit of A/A testing
Despite these problems, A/A testing has the potential to help you catch even bigger issues during real tests. When the results of those tests are the ones you’re basing important business decisions on, that’s a powerful benefit to consider.
If you do decide to A/A test, there’s a potentially less wasteful way to do it, called A/A/B testing.
A/A/B testing vs. A/A testing
The traditional method of A/A testing wastes traffic because it doesn’t tell you anything about your visitors at its conclusion. But, if you add a “B” variation to that test, it could. Here’s the difference between the two:
- A/A test = 2 identical pages tested against each other
- A/A/B test = 2 identical pages and one variation tested against each other
An A/A/B test splits your traffic into three segments, which means it will take longer to reach statistical significance. But the upside is, once you do, you’ll have data about both your testing tool and your visitors.
Compare the results of A vs. A to determine whether you can trust your test. If they’re statistically similar, compare the results of A vs. B. If they’re not, though, you’ll have to throw out the results of the entire test (which took longer than a traditional A/A test to run since your traffic is segmented three ways).
Do A/A testing benefits outweigh the cons?
Some experts say “yes,” while others say “no.” Andrew First of Leadplum seems to think the answer falls somewhere between:
A/A testing probably shouldn’t be a monthly affair, but when you’re setting up a new tool, it’s worth taking the time to test your data. If you intercept bad data now, you’ll be more confident in your testing results months down the line.
Ultimately, it’s up to you. If you’re using a new tool, it may be wise to take Andrew’s advice. If you’re not, though, it’s probably best to follow Craig Sullivan’s lead and instead set up a rigorous pre-test QA process. Save your time, resources, and traffic for A/B testing.
Get the most out of your testing efforts and digital ad campaigns, sign up for an Instapage Enterprise demo today.
See the Instapage Enterprise Plan in Action.
Demo includes AdMap™, Personalization, AMP,
Global Blocks, heatmaps & more.