A/B testing is fun. It’s popular. It’s getting easier to do.
However, if you’re doing A/B testing wrong, you still may be wasting a ton of time and resources.
Even with the increasing ubiquity of A/B testing, there are still many myths around the subject, some of which are quite common. To really derive value from any given technique, it’s important to understand it for what it is — including its limitations and understanding where it’s powerful.
This article will outline the top myths I’ve seen spouted time and time again in blogs and by consultants.
1. A/B Testing and optimization are the same thing
This may seem a bit finicky, but A/B testing itself doesn’t increase conversions. Many articles say something to the effect of “do A/B testing to increase conversions,” but this is semantically inaccurate.
A/B testing, otherwise known as an “online controlled experiment,” is a summative research method that tells you, with hard data, how the changes you make to an interface are affecting key metrics.
What does that mean in non-academic terms? A/B testing is a part of optimization, but optimization encompasses a broader swath of techniques than just the experimentation aspect.
As Justin Rondeau, Director of Optimization at Digital Marketer, put it, “Conversion rate optimization is a process that uses data analysis and research to improve the customer experience and squeeze the most conversions out of your website.”
Optimization is really about validated learning. You’re balancing an exploration/exploitation problem (exploring to find what works, and exploiting it for profit when you do) as you seek the optimal path to profit growth.
2. You should test everything
I was reading a forum on CRO where someone asked about a particular word choice in a headline (I think it was “awesome” or something), and they were wondering whether or not it was overused.
An “expert” chimed in with advice (paraphrasing here) that you can never know for sure until you test every other similar word (“fascinating,” “incredible,” “marvelous,” etc.)
This is silly advice for 99.95% of people.
Everyone has heard the story about how Google tested 41 shades of blue. Similarly, it’s quite clear that a site like Facebook or Amazon theoretically has the traffic to run tests like this.
But if you run a small to medium e-commerce site (or SaaS, or whatever), even if you’re part of a very large company, it’s almost always a waste of time, resources, and traffic to run tests like this.
Why, you may ask? Because prioritization is key.
Everyone can look at a site and see dozens of random things they could change if they wanted to (whether informed by data or not). But where’s the efficiency in that?
At best, you’re wasting traffic on things that don’t matter, and you’ll consistently get inconclusive results if you do this (good luck getting continued support from stakeholders if that’s the case).
Whatever the case, though, you’re faced with a tremendous opportunity cost: because you’re wasting time and resources on things that don’t matter, you’re excluded from implementing changes that fundamentally alter and improve the user experience. The things that make a real difference (and make real money).
3. Everybody should A/B test
A/B testing is incredibly powerful and useful. No one is going to (intelligently) argue against that.
But that doesn’t mean everyone should do it.
Roughly speaking, if you have less than 1,000 transactions (purchases, signups, leads, etc.) per month — you’re going to be better off putting your effort in other stuff. Maybe you could get away with running tests around 500 transactions for months — but you’re going to need some big lifts to see an effect.
A lot of micro-businesses, startups, and small businesses just don’t have that transaction volume (yet).
You have to keep in mind costs, as well. All of them, not just the cost of optimization software like Optimizely. Things like:
- Conversion research. You have to figure out what to test (as mentioned above).
- Designing the treatment (wireframing, prototyping, etc.).
- Coding up the test.
- QAing the test.
Now, let’s say you get an 8% lift, and it’s a valid winner. You had 125 leads per week, and now you have 135 / week. Is the ROI there? Maybe — it depends on your lead value. But you have to account for time, resources, and most importantly, the opportunity costs of your actions.
So, when you calculate your needed sample sizes before you run the test, do the math on the ROI as well. What would be the value of X% lift in actual dollars?
Time is a precious resource. It might be better spent elsewhere than A/B testing when you’re still small — because of math.
4. Only change one element per A/B test
This is probably the most commonly passed myth out there. The intentions are good, but it’s a flawed premise.
Here’s the advice: Only make one change per test, so you know what is actually making a difference.
For example, if you change your headline, add some social proof, and change your call-to-action text and color, and you get a 25% lift, how can you tell what caused the change?
It’s true; you really can’t. But let me also ask (and this is especially pointed at those without the luxury of high traffic sites), do you really care?
In an ideal world, notably, one made up of iterative changes that build on each other, yes, testing one thing at a time limits the noise on a test and lets you understand what exactly caused the change.
Also, you have to define your Smallest Meaningful Unit (SMU), and this is where things get a bit captious. Matt Gershoff, CEO of Conductrics put it well, telling me:
“To take the logic to an extreme, you could argue that changing a headline is making multiple changes since you are changing more than one word at a time.
So it depends on what you want to do. Do you care about the wording of your CTA and really want to know whether it caused a change or not? Are you radically changing your page? Your site?
The SMU depends on your goals, and trust me, in the real world, no analyst or optimization specialist is shouting, “only one change per test!”
As Mr. Rondeau pointed out in this post, what one thing would you change on this site (pictured below – this is an old version of the site by the way)?
Let’s even assume this site has a ton of traffic, and you can run like eight valid tests per month. If you’re doing one element at a time, where do you start? It would take you forever to test the background image, the font color, the font size, the logo at the top, the navigation thumbnails, location, size, order, copy, the body copy, the moving salesmen, etc., etc.
My point here is this: Don’t be afraid to bundle multiple changes in the same test.
5. A/B Tests are better (or worse) than bandits/MVT/etc
You see articles pop up from time to time advocating that you should “avoid multivariate (MVT)” because they’re complicated and don’t produce wins, or that bandits are inefficient compared to A/B tests — or that they’re more efficient — or whatever.
A good rule of thumb in life is if you’re dealing with a dichotomy, a this vs. that situation, you’re probably being set up. It’s probably a false dichotomy.
Truth is, A/B testing is better in some situations, where MVT is the best choice in others. Same with bandits and adaptive algorithms.
6. Stop an A/B test when it reaches significance
While I won’t get too granular on the statistics (you can read everything you need to know in this post), saying “stop it at statistical significance” is wrong, mostly due to the nature of the online environment.
It’s a shame this myth is widespread, and statistical knowledge in the marketing world is surprisingly contained.
It’s a common occurrence, too, that your testing tool will tell you you’ve reach significance too early. So don’t put all your faith in that 95% significance.
First, pre-calculate your sample size and test duration. Then run the test for that long. Also, test for full weeks (start on a Monday? End on a Monday). And it’s recommended to run the test through multiple business cycles to account for non-stationary data (data that doesn’t stay the same over time). For instance, a big sale one week or a PR spike could throw your data off by quite a bit. Even different days have different conversion rates many times. Maybe you have a 3% conversion rate on Tuesdays but a 1.5% conversion rate on Saturdays, and maybe that difference will throw off your post-test analysis.
So test for full weeks to account for these ebbs and flows. At CXL, we recommend running a test for 3-4 weeks.
Then consider a statistical significance of at least 95%.
A/B testing is incredibly powerful. It’s a powerful deterrent to gut-based decision making and shows you what data says you should do instead.
A/B testing allows you to ascertain which post-click page is bringing in the most conversions. Learn how to provide 1:1 ad personalization for every audience you have with an Instapage Personalization Demo today.
See the Instapage Enterprise Plan in Action.
Demo includes AdMap™, Personalization, AMP,
Global Blocks, heatmaps & more.