Actor Maurice Chevalier once wisely said, “If you wait for the perfect moment when all is safe and assured, it may never arrive. Mountains will not be climbed, races won, or lasting happiness achieved.”
“...nor many A/B tests concluded,” we’d like to add.
When split-testing, there’s never a “perfect” time to declare a winning variation. Sure, you could argue that moment occurs when a test has run long enough to reach statistical significance — but even then, you can’t be confident that your results are entirely accurate.
So when even the big-timers, who have access to the massive amounts of traffic needed to reach statistical significance, can’t be 100% sure of their results, how are you supposed to gain relevant insight in a timely manner from your A/B tests?
In A/B testing, statistical significance is what tells you whether your variation is actually better or worse than the original. It’s measured in percentage.
At a 50% level of statistical significance, you can be just 50% sure that your test results are accurate. FYI, 50% isn’t an acceptable level of significance in any profession. Neither is 75%. Nor 85%. So what is?
“Not good enough. You’re performing a science experiment here. Yes, you want it to be true. You want that 90% to win, but more important than having a “declared winner” is getting to the truth.”
He goes on to add, “You should not call tests before you’ve reached 95% or higher. 95% means that there’s only a 5% chance that the results are a complete fluke.”
So, other than a 95% level of significance, what do you need to make sure you set up your test for accurate results?
For example, let’s say your control page’s baseline conversion rate is 5%. You and your team decide that you want to confidently be able to say that a 10+% adjustment in conversion rate is due to your change, and not chance. And you want there only to be a 5% chance that your test is a fluke, so you’re testing at a 95% level of significance.
If we plug these numbers into Optimizely’s sample size calculator, here’s what we get:
So, if you’re testing two pages against each other, each needs to have a sample size of 31,000 visitors. For most businesses, that’s a pretty big number.
A survey of HubSpot customers a few years ago found that the median number of weekly unique visitors for small businesses are as follows:
Even if you’re a part of the largest group on this graph, and you include ALL your daily visitors in your test, it’ll take you 85 days to reach statistical significance. If you’re the smallest, it’ll take you nearly 14 years. Not all that practical, is it?
So if you don’t generate a steady flow of visitors, does that mean you can’t A/B test at all?
Nope. You still can — you just need to do it a little differently. For your tests, you’ll have to focus less on quantity of feedback (the 31,000-visitor sample size it’ll take to reach statistical significance), and more on quality of feedback (figuring out more from each visitor).
Quantitative feedback is all about numbers. In an A/B test, when you have enough quantitative feedback, you can start to make inferences about visitor behavior.
For instance, let’s say you’re running an A/B test on your landing page using the numbers above. In it, your control features a red CTA button, and your variation features a blue one, and so far, more visitors are converting on your variation than the control. Until each page has generated 31,000 visitors, you can’t be confident that the reason they’re converting more on your variation is because of the change in button color.
But when you’re collecting qualitative feedback, you engage with each individual prospect more to determine why it is they made the decisions they did. In the old days, qualitative feedback was only collectible via focus groups behind one-way mirrors. Today, marketers have a number of tools techniques at their disposal to make the most of each visitor.
Survey tools allow you to ask your customers questions about their user experience to determine what might be the cause of their behavior:
“Let’s say you are selling a dog training course/eBook. You have a great sales page, plenty of testimonials, excellent design, and a bonus eBook to go along with the course. However, the conversions are dismal; on some days, you don’t even have any conversions.
What can you do?
If the answer you get is something like, “I don’t know where to submit my information,” or, “This button isn’t working,” you won’t need to hear it 31,000 to learn that you need to make a change on your landing page.
Just a few answers may reveal the reason behind your low conversion days. Use that visitor feedback to create a hypothesis about what you should change, and then run an A/B test to see if those changes make a difference.
Similar to the way survey tools can teach you more about your prospects, so can a live support person with chat. Live chat tools allow you to automatically deploy a small pop-up chat window in the visitor’s browser, introducing you as a helpful resource when the customer encounters an issue of friction.
If that happens, the idea is that they’ll ask you for assistance. If your customers are running into the same problems, asking questions like “Does your product come with a warranty?”, it’s probably time to test a version of the page that solves it. If you know exactly what your customers want from you, you can take the guesswork out of testing, and be confident that your new version will bring a conversion lift.
When you don’t have enough traffic, and you don’t know which changes will have the biggest impact on your conversion rate, the best thing to do is focus on the ones that do have a potential to bring a big lift. You have to make that traffic count.
What are the odds a slightly smaller logo changes your conversion rate? Incredibly low. How many more people will hit your CTA button if it’s red instead of orange? Likely very few.
Make the best of your traffic by only focusing on things that are likely to bring a lift.
Yes, it’s true, people prefer to be contacted via email over telephone — but that doesn't mean that you won’t be able to get them on the phone. Try taking the route that Alex Turnbull from Groove did.
In September of 2014, he sent this email out to 2,000+ customers:
“The response,” he says, “blew me away.” In fact, it was so overwhelming that Alex realized a simple back-and-forth via email and a phone call with each customer weren’t plausible. So, he signed up for some services to help him handle the vast amount of conversations he would need to have.
Armed with a pen and paper, a phone, and memberships to Join.me, Skype, and Doodle, he set out to talk to everyone who responded. In four weeks, he spent more than 100 hours speaking to customers about what they wanted to see from Groove. The results were:
Better understanding your customers means better understanding their wants, needs, and how to communicate those online. With insight like that, you won’t need to talk to 31,000 customers. In this case, 500 short conversations were easily more valuable than tens of thousands of clicks.
When you need to make the most of each visit, heat mapping software allows you to gain deeper insight into the behavior of your prospects. Hover maps, click maps, and scroll maps will show you what your visitors are paying attention to, where they’re clicking, and how far down your page they’re scrolling.
The only problem is, with software like this, you’ll need a few thousands visits per page before you can make accurate assumptions about the data. To get around this, Peep Laja recommends you use tools that operate algorithmically, like EyeQuant, Feng-GUI, and LookTracker.
With them, you can upload a screenshot, and the software will give you tips to improve the page. “It’s not your actual users, but it’s something,” Peep says, “and some people I trust swear on their accuracy.”
What methods of generating qualitative feedback have you found useful? Have you successfully run A/B tests with low amounts of traffic?
Let us know in the comments, then use Instapage to build and A/B test your landing pages without IT.