A/B Testing, or “Split Testing” as it is also known, can be one of the most useful and powerful tools available for CRO, when used correctly. Without careful planning and analysis, however, the potential benefits of an A/B test may be outweighed by the combined impact of errors, noise and false assumptions. For these reasons, we created The Crazy Egg A/B Test Planning Guide. Our user-friendly guide provides a roadmap through the A/B test planning process. In addition, it serves as a convenient way to record and store your testing history for future review. What is an A/B Test? If…
Will the real A/B testing success metrics please stand up?
It’s 2017, and most marketers understand the importance of A/B testing. The strategy of applying the scientific method to marketing to prove whether an idea will have a positive impact on your bottom-line is no longer novel.
But, while the practice of A/B testing has become more and more common, too many marketers still buy into pervasive A/B testing myths. #AlternativeFacts.
As more A/B testing ‘experts’ pop up, A/B testing myths have become more specific. Driven by best practices and tips and tricks, these myths represent ideas about A/B testing that will derail your marketing optimization efforts if left unaddressed.
Avoid the pitfalls of ad-hoc A/B testing…
Get this guide, and learn how to build an optimization machine at your company. Discover how to use A/B testing as part of your bigger marketing optimization strategy!
By entering your email, you’ll receive bi-weekly WiderFunnel Blog updates and other resources to help you become an optimization champion.
But never fear! With the help of WiderFunnel Optimization Strategist, Dennis Pavlina, I’m going to rebut four A/B testing myths that we hear over and over again. Because there is such a thing as a successful, sustainable A/B testing program…
Into the light, we go!
Myth #1: The more tests, the better!
A lot of marketers equate A/B testing success with A/B testing velocity. And I get it. The more tests you run, the faster you run them, the more likely you are to get a win, and prove the value of A/B testing in general…right?
Not so much. Obsessing over velocity is not going to get you the wins you’re hoping for in the long run.
The key to sustainable A/B testing output, is to find a balance between short-term (maximum testing speed), and long-term (testing for data-collection and insights).
When you focus solely on speed, you spend less time structuring your tests, and you will miss out on insights.
With every experiment, you must ensure that it directly addresses the hypothesis. You must track all of the most relevant goals to generate maximum insights, and QA all variations to ensure bugs won’t skew your data.
An emphasis on velocity can create mistakes that are easily avoided when you spend more time on preparation.
Another problem: If you decide to test many ideas, quickly, you are sacrificing your ability to really validate and leverage an idea. One winning A/B test may mean quick conversion rate lift, but it doesn’t mean you’ve explored the full potential of that idea.
You can often apply the insights gained from one experiment, when building out the strategy for another experiment. Plus, those insights provide additional evidence for testing a particular concept. Lining up a huge list of experiments at once without taking into account these past insights can result in your testing program being more scattershot than evidence-based.
While you can make some noise with an ‘as-many-tests-as-possible’ strategy, you won’t see the big business impact that comes from a properly structured A/B testing strategy.
Myth #2: Statistical significance is the end-all, be-all
A quick definition
Statistical significance: The probability that a certain result is not due to chance. At WiderFunnel, we use a 95% confidence level. In other words, we can say that there is a 95% chance that the observed result is because of changes in our variation (and a 5% chance it is due to…well…chance).
If a test has a confidence level of less than 95% (positive or negative), it is inconclusive and does not have our official recommendation. The insights are deemed directional and subject to change.
Ok, here’s the thing about statistical significance: It is important, but marketers often talk about it as if it is the only determinant for completing an A/B test. In actuality, you cannot view it within a silo.
For example, a recent experiment we ran reached statistical significance three hours after it went live. Because statistical significance is viewed as the end-all, be-all, a result like this can be exciting! But, in three hours, we had not gathered a representative sample size.
You should not wait for a test to be significant (because it may never happen) or stop a test as soon as it is significant. Instead, you need to wait for the calculated sample size to be reached before stopping a test. Use a test duration calculator to understand better when to stop a test.
After 24 hours, the same experiment had dropped to a confidence level of 88%, meaning that there was now only an 88% likelihood that the difference in conversion rates was not due to chance – i.e. statistically significant.
Traffic behaves differently over time for all businesses, so you should always run a test for full business cycles, even if you have reached statistical significance. This way, your experiment has taken into account all of the regular fluctuations in traffic that impact your business.
For an e-commerce business, a full business cycle is typically a one-week period; for subscription-based businesses, this might be one month or longer.
Myth #2, Part II: You have to run a test until reaches statistical significance
As Claire pointed out, this may never happen. And it doesn’t mean you should walk away from an A/B test, completely.
As I said above, anything below 95% confidence is deemed subject to change. But, with testing experience, an expert understanding of your testing tool, and by observing the factors I’m about to outline, you can discover actionable insights that are directional (directionally true or false).
Results stability: Is the conversion rate difference stable over time, or does it fluctuate? Stability is a positive indicator.
Experiment timeline: Did I run this experiment for at least a full business cycle? Did conversion rate stability last throughout that cycle?
Relativity: If my testing tool uses t-test to determine significance, am I looking at the hard numbers of actual conversions in addition to conversion rate? Does the calculated lift make sense?
LIFT & ROI: Is there still potential for the experiment to achieve X% lift? If so, you should let it run as long as it is viable, especially when considering the ROI.
Impact on other elements: If elements outside the experiment are unstable (social shares, average order value, etc.) the observed conversion rate may also be unstable.
You can use these factors to make the decision that makes the most sense for your business: implement the variation based on the observed trends, abandon the variation based on observed trends, and/or create a follow-up test!
Myth #3: An A/B test is only as good as its effect on conversion rates
Well, if conversion rate is the only success metric you are tracking, this may be true. But you’re underestimating the true growth potential of A/B testing if that’s how you structure your tests!
To clarify: Your main success metric should always be linked to your biggest revenue driver.
But, that doesn’t mean you shouldn’t track other relevant metrics! At WiderFunnel, we set up as many relevant secondary goals (clicks, visits, field completions, etc.) as possible for each experiment.
This ensures that we aren’t just gaining insights about the impact a variation has on conversion rate, but also the impact it’s having on visitor behavior.
– Dennis Pavlina
When you observe secondary goal metrics, your A/B testing becomes exponentially more valuable because every experiment generates a wide range of secondary insights. These can be used to create follow up experiments, identify pain points, and create a better understanding of how visitors move through your site.
One of our clients provides an online consumer information service — users type in a question and get an Expert answer. This client has a 4-step funnel. With every test we run, we aim to increase transactions: the final, and most important conversion.
But, we also track secondary goals, like click-through-rates, and refunds/chargebacks, so that we can observe how a variation influences visitor behavior.
In one experiment, we made a change to step one of the funnel (the landing page). Our goal was to set clearer visitor expectations at the beginning of the purchasing experience. We tested 3 variations against the original, and all 3 won resulted in increased transactions (hooray!).
The secondary goals revealed important insights about visitor behavior, though! Firstly, each variation resulted in substantial drop-offs from step 1 to step 2…fewer people were entering the funnel. But, from there, we saw gradual increases in clicks to steps 3 and 4.
Our variations seemed to be filtering out visitors without strong purchasing intent. We also saw an interesting pattern with one of our variations: It increased clicks from step 3 to step 4 by almost 12% (a huge increase), but decreased actual conversions by -1.6%. This result was evidence that the call-to-action on step 4 was extremely weak (which led to a follow-up test!)
We also saw large decreases in refunds and chargebacks for this client, which further supported the idea that the right visitors (i.e. the wrong visitors) were the ones who were dropping off.
This is just a taste of what every A/B test could be worth to your business. The right goal tracking can unlock piles of insights about your target visitors.
Myth #4: A/B testing takes little to no thought or planning
Believe it or not, marketers still think this way. They still view A/B testing on a small scale, in simple terms.
But A/B testing is part of a greater whole—it’s one piece of your marketing optimization program—and you must build your tests accordingly. A one-off, ad-hoc test may yield short-term results, but the power of A/B testing lies in iteration, and in planning.
At WiderFunnel, a significant amount of research goes into developing ideas for a single A/B test. Even tests that may seem intuitive, or common-sensical, are the result of research.
Because, with any test, you want to make sure that you are addressing areas within your digital experiences that are the most in need of improvement. And you should always have evidence to support your use of resources when you decide to test an idea. Any idea.
So, what does a revenue-driving A/B testing program actually look like?
Today, tools and technology allow you to track almost any marketing metric. Meaning, you have an endless sea of evidence that you can use to generate ideas on how to improve your digital experiences.
Which makes A/B testing more important than ever.
An A/B test shows you, objectively, whether or not one of your many ideas will actually increase conversion rates and revenue. And, it shows you when an idea doesn’t align with your user expectations and will hurt your conversion rates.
And marketers recognize the value of A/B testing. We are firmly in the era of the data-driven CMO: Marketing ideas must be proven, and backed by sound data.
But results-driving A/B testing happens when you acknowledge that it is just one piece of a much larger puzzle.
One of our favorite A/B testing success stories is that of DMV.org, a non-government content website. If you want to see what a truly successful A/B testing strategy looks like, check out this case study. Here are the high level details:
We’ve been testing with DMV.org for almost four years. In fact, we just launched our 100th test with them. For DMV.org, A/B testing is a step within their optimization program.
Continuous user research and data gathering informs hypotheses that are prioritized and created into A/B tests (that are structured using proper Design of Experiments). Each A/B test delivers business growth and/or insights, and these insights are fed back into the data gathering. It’s a cycle of continuous improvement.
And here’s the kicker: Since DMV.org began A/B testing strategically, they have doubled their revenue year over year, and have seen an over 280% conversion rate increase. Those numbers kinda speak for themselves, huh?
What do you think?
Do you agree with the myths above? What are some misconceptions around A/B testing that you would like to see debunked? Let us know in the comments!
We lied to you. For years, we, as providers of an A/B testing tool, told you it was easy. We made a visual editor and pretty graphs and gave you wins on engagement or a lower bounce rate, but we did not really contribute to your bottom line. My apologies for making it look easy and dragging you into A/B testing when, in fact, it is actually very hard to do right. Flashback: It was July 2012, and on a sunny afternoon at the Blue Dahlia Cafe in Austin, I had lunch with Bryan and Jeffrey Eisenberg, both recognized authorities…
Confidence Level: The percentage of time that a statistical result would be correct if you took numerous random samples. Confidence is often associated with assuredness, and the statistical meaning is closely related to this common usage of the term. To state a percentage value for confidence in something is essentially stating a level of how “sure” you are that it will happen. In statistical terms, it is the expected percentage of time that your range of values will be correct if you were to repeat the same experiment over and over again. Unfortunately, there is no such thing as a…
Almost three thousand years ago, the ancient Greeks were already well on their way to developing such innovations as the catapult, indoor plumbing, and of course, the alphabet. What ultimately became the modern letters “A” and “B” actually originated as Alpha (α) and Beta (β) in the Greek alphabet. Today, Alpha and Beta have also become commonly used statistical terms. This is particularly true for Hypothesis testing, such as A/B testing for conversion rate optimization. So might there be a connection between A/B testing and these ancient Greek letter counterparts? Not really. However, I have found that the statistical concepts…
A certain category of design gaffes can be boiled down to violations of audience expectations. Websites that don’t work in Internet Explorer are a heck of a nasty surprise for users who, bless their souls, want the same Internet experience as everyone else. Websites that prevent copying, whether through careless text-as-image conversions or those wretched copyright pop-ups from the turn of the century, cripple a feature that works nearly everywhere else on the Internet.