When was the last time you sat down to write an amazing piece of content, pulled out your mathematical matrix for determining the most valuable keyword phrases, and set to work with a smile on your face? Yeah, me neither. If SEO and content marketers were forced to use a mathematical model to discover valuable keywords, our jobs would be a hundred times harder than they are now. Thankfully, there’s a little thing called latent semantic indexing (LSI) which can take your SEO ranking game from “0” to “100” in a jiffy (more like “100” to “1” if we’re getting…
You post image after image on Instagram, but can’t seem to get more than 25 or 30 Likes on each. Your comments are as inactive as a car broken down on the side of the road. You’ve tried every “hack” in the book, but nothing seems to work. Don’t give up hope just yet. A lot of marketers look for the “secret” to get floods of engagement. Well, the secret has less to do with a specific Instagram filter or tactic, and more about understanding how we see and process visual information. Figuring out what attracts a person’s eye to…
A type of hypothesis testing where multiple variables are tested simultaneously to determine how the variables and combinations of variables influence the output. If several different variables (factors) are believed to influence the results or output of a test, multivariate testing can be used to test all of these factors at once. Using an organized matrix which includes each potential combination of factors, the effects of each factor on the result can be determined. Also known as Design of Experiments (DOE), the first multivariate testing was performed in 1754 by Scottish physician James Lind as a means of identifying (or…
Confidence Interval: A range of values calculated such that there is a known probability that the true mean of a parameter lies within it. The science of statistics is all about predicting results by sampling a portion of a population. Since you can never be 100% certain of that prediction, the result is often expressed as a possible range of values. This range is also known as the confidence interval. For example, you might estimate average body weight based on a random sample of 500 men and 500 women. Your sample results will vary, so you need to add a…
Sample Size: The number (n) of observations taken from a population through which statistical inferences for the whole population are made. The concept of sampling from a larger population to determine how that population behaves, or is likely to behave, is one of the basic premises behind the science of applied statistics. For example, if you have a population of 10 million adults, and you sample 10 of them to find out their favorite television show, intuitively you will realize the sample size you have chosen is not large enough to draw valid conclusions. Knowing just how large “n”…
To A/A test or not is a question that invites conflicting opinions. Enterprises when faced with the decision of implementing an A/B testing tool do not have enough context on whether they should A/A test. Knowing the benefits and loopholes of A/A testing can help organizations make better decisions.
In this blog post we explore why some organizations practice A/A testing and the things they need to keep in mind while A/A testing. We also discuss other methods that can help enterprises decide whether or not to invest in a certain A/B testing tool.
Why Some Organizations Practice A/A Testing
A/A testing is done when organizations are taking up new implementation of an A/B testing tool. Running an A/A test at that time can help them with:
Checking the accuracy of an A/B Testing tool
Setting a baseline conversion rate for future A/B tests
Deciding a minimum sample size
Checking the Accuracy of an A/B Testing Tool
Organizations who are about to purchase an A/B testing tool or want to switch to a new testing software may run an A/A test to ensure that the new software works fine, and that it has been set up properly.
Tomasz Mazur, an eCommerce Conversion Rate Optimization expert, explains further: “A/A testing is a good way to run a sanity check before you run an A/B test. This should be done whenever you start using a new tool or go for new implementation. A/A testing in these cases will help check if there is any discrepancy in data, let’s say, between the number of visitors you see in your testing tool and the web analytics tool. Further,this helps ensure that your hypothesis are verified.”
In an A/A test, a web page is A/B tested against an identical variation. When there is absolutely no difference between the control and the variation, it is expected that the result will beinconclusive. However, in cases where an A/A test provides a winner between two identical variations, there is a problem. The reasons could be the following:
The tool has not been set up properly.
The test hasn’t been conducted correctly.
The testing tool is inefficient.
Here’s what Corte Swearingen,Director, A/B Testing and Optimization at American Eagle,has to say about A/A testing: “I typically will run an A/A test when a client seems uncertain about their testing platform, or needs/wants additional proof that the platform is operating correctly. There really is no better way to do this than to take the exact same page and test it against itself with no changes whatsoever. We’re essentially tricking the platform and seeing if it catches us! The bottom line is that while I don’t run A/A tests very often, I will occasionally use it as a proof of concept for a client, and to help give them confidence that the split testing platform they are using is working as it should.”
Determining the Baseline Conversion Rate
Before running any A/B test, you need to know the conversion rate that you will be benchmarking the performance results against. This benchmark is your baseline conversion rate.
An A/A test can help you set the baseline conversion rate for your website. Let’s explain this with the help of an example. Suppose you are running an A/A test where the control gives 303 conversions out of 10,000 visitors and the identical variation B gives 307 out of 10,000 conversions. The conversion rate for A is 3.03%, and that for B is 3.07%, when there is no difference between the two variations. Therefore, the conversion rate range that can be set as a benchmark for future A/B tests can be set at 3.03–3.07%. If you run an A/B test later and get an uplift within this range, this might mean that this result is not significant.
Deciding a Minimum Sample Size
A/A testing can also help you get an idea about the minimum sample size from your website traffic. A small sample size would not include sufficient traffic from multiple segments. You might miss out on a few segments which can potentially impact your test results. With a larger sample size, you have a greater chance of taking into account all segments that impact the test.
Corte says, “A/A testing can be used to make a client understand the importance of getting enough people through a test before assuming that a variation is outperforming the original.” He explains this with an A/A testing case study that was done for Sales Training Program landing pages for one of his clients, Dale Carnegie. The A/A test that was run on two identical landing pages got test results indicating that a variation was producing an 11.1% improvement over the control. The reason behind this was that the sample size being tested was too small.
After having run the A/A test for a period of 19 days and with over 22,000 visitors, the conversion rates between the two identical versions were the same.
Michal Parizek, Senior eCommerce & Optimization Specialist atAvast, shares similar thoughts. He says, “At Avast, we did a comprehensive A/A test last year. And it gave us some valuable insights and was worth doing it!” According to him, “It is always good to check the statistics before final evaluation.”
At Avast, they ran an A/A test on two main segments—customers using the free version of the product and customers using the paid version. They did so to get a comparison.
The A/A test had been live for 12 days, and they managed to get quite a lot of data. Altogether, the test involved more than 10 million users and more than 6,500 transactions.
In the “free” segment, they saw a 3% difference in the conversion rate and 4% difference in Average Order Value (AOV). In the “paid” segment, they saw a 2% difference in conversion and 1% difference in AOV.
“However, all uplifts were NOT statistically significant,” says Michal. He adds, “Particularly in the ‘free’ segment, the 7% difference in sales per user (combining the differences in the conversion rate and AOV) might look trustworthy enough to a lot of people. And that would be misleading. Given these results from the A/A test, we have decided to implement internal A/B testing guidelines/lift thresholds. For example, if the difference in the conversion rate or AOV is lower than 5%, be very suspicious that the potential lift is not driven by the difference in the design but by chance.”
Michal sums up his opinion by saying, “A/A testing helps discover how A/B testing could be misleading if they are not taken seriously. And it is also a great way to spot any bugs in the tracking and setup.”
Problems with A/A Testing
In a nutshell, the two main problems inherent in A/A testing are:
Everpresent element of randomness in any experimental setup
Requirement of a large sample size
We will consider these one by one:
Element of Randomness
As pointed out earlier in the post, checking the accuracy of a testing tool is the main reason for running an A/A test. However, what if you find out a difference between conversions of control and an identical variation? Do you always point it out as a bug in the A/B testing tool?
The problem (for the lack of a better word) with A/A testing is that there is always an element of randomness involved. In some cases, the experiment acquires statistical significance purely by chance, which means that the change in the conversion rate between A and its identical version is probabilistic and does not denote absolute certainty.
Tomaz Mazur explains randomness with a real-world example. “Suppose you set up two absolutely identical stores in the same vicinity. It is likely, purely by chance or randomness, that there is a difference in results reported by the two. And it doesn’t always mean that the A/B testing platform is inefficient.”
Requirement of a Large Sample Size
Following the example/case study provided by Corte above, one problem with A/A testing is that it can be time-consuming. When testing identical versions, you need a large sample size to find out if A is preferred to its identical version. This in turn will take too much time.
As explained in one of the ConversionXL’s posts, “The amount of sample and data you need to prove that there is no significant bias is huge by comparison with an A/B test. How many people would you need in a blind taste testing of Coca-Cola (against Coca-Cola) to conclude that people liked both equally? 500 people, 5000 people?” Experts at ConversionXL explain that entire purpose of an optimization program is to reduce wastage of time, resources, and money. They believe that even though running an A/A test is not wrong, there are better ways to use your time when testing. In the post they mention, “The volume of tests you start is important but even more so is how many you *finish* every month and from how many of those you *learn* something useful from. Running A/A tests can eat into the “real” testing time.”
VWO’s Bayesian Approach and A/A Testing
VWO uses a Bayesian-based statistical engine for A/B testing. This allows VWO to deliver smart decisions–it tells you which variation will minimize potential loss.
Chris Stucchio, Director of Data Science at VWO, shares his viewpoint on how A/A testing is different in VWO than typical frequentist A/B testing tools.
Most A/B testing tools are seeking truth. When running an A/A test in a frequentist tool, an erroneous “winner” should only be reported 5% of the time. In contrast, VWO’s SmartStats is attempting to make a smart business decision. We report a smart decision when we are confident that a particular variation is not worse than all the other variations, that is, we are saying “you’ll leave very little money on the table if you choose this variation now.” In an A/A test, this condition is always satisfied—you’ve got nothing to lose by stopping the test now.
The correct way to evaluate a Bayesian test is to check whether the credible interval for lift contains 0% (the true value).
He also says that the possible and simplest reason for A/A tests to provide a winner
Other Methods and Alternatives to A/A Testing
A few experts believe that A/A testing is inefficient as it consumes a lot of time that could otherwise be used in running actual A/B tests. However, there are others who say that it is essential to run a health check on your A/B testing tool. That said, A/A testing alone is not sufficient to establish whether one testing tool should be prefered over another. When making a critical business decision such as buying a new tool/software application for A/B testing, there are a number of other things that should be considered.
Corte points out that though there is no replacement or alternative to A/A testing, there are other things that must be taken into account when a new tool is being implemented. These are listed as follows:
Will the testing platform integrate with my web analytics program so that I can further slice and dice the test data for additional insight?
Will the tool let me isolate specific audience segments that are important to my business and just test those audience segments?
Will the tool allow me to immediately allocate 100% of my traffic to a winning variation? This feature can be an important one for more complicated radical redesign tests where standardizing on the variation may take some time. If your testing tool allows immediate 100% allocation to the winning variation, you can reap the benefits of the improvement while the page is built permanently in your CMS.
Does the testing platform provide ways to collect both quantitative and qualitative information about site visitors that can be used for formulating additional test ideas? These would be tools like heatmap, scrollmap, visitor recordings, exit surveys, page-level surveys, and visual form funnels. If the testing platform does not have these integrated, do they allow integration with third-party tools for these services.
Does the tool allow for personalization? If test results are segmented and it is discovered that one type of content works best for one segment and another type of content works better for a second segment, does the tool allow you to permanently serve these different experiences for different audience segments”?
That said, there is still a set of experts or people who would opt for alternatives such as triangulating data over an A/A test. Using this procedure means you have two sets of performance data to cross-check with each other. Use one analytics platform as the base to compare all other outcomes against, to check if there is something wrong or something that needs fixing.
And then there is the argument—why just A/A test when you can get more meaningful insights by running an A/A/B test. Doing this, you can still compare two identical versions while also testing some changes in the B variant.
When businesses face the decision of implementing a new testing software application, they need to run a thorough check on the tool. A/A testing is one method that some organizations use for checking the efficiency of the tool. Along with personalization and segmentation capabilities and some other pointers mentioned in this post, this technique can help check if the software application is good for implementation.
Did you find the post insightful? Drop us a line in the comments section with your feedback.
On Sunday, June 19, the Cleveland Cavaliers beat the Golden State Warriors in Game 7 of the NBA Finals. The franchise, founded in 1970, had never won an NBA championship.
A few weeks after the Cavs’ victory, Nike released a spot called “Worth the Wait”.
As of this article being published, the video on YouTube has over 5.6 million views.
Every time I watch this video, my throat tightens and I tear up a little. I’m not from Ohio (in fact, I’m from a notorious rival state), the Cavs are not my team, I’m not even a huge basketball fan. But this ad makes me feel. It taps into something deeply human, feelings of community and triumph.
Nike is incredible at this. From their 2012 “Find Your Greatness” campaign to their 2014 ad for the World Cup “Winner Stays” (which has more than 40 million views on YouTube), Nike knows how to elicit emotion.
And it’s clear they spend big bucks to do it. Why?
Because Nike knows that we — consumers, people, humans — don’t buy products or services…we buy feelings.
Comfort. Acceptance. Power. Freedom. Control. Love. We are all longing to find satisfaction for our intangible desires. If you can provide a payoff for your prospects’ unspoken needs, you will find yourself handsomely rewarded.
If you’re a marketer, chances are you’ve heard about the ‘old’, ‘middle’ and ‘new’ brains in relation to how we make (buying) decisions. The 3 brains refer to the structure of the brain in relation to its evolutionary history. Here’s a brief overview.
In the 1940’s, Paul MacLean popularized the triune brain theory, where he categorized the brain into 3 parts: Reptilian (old, sensory), Limbic (middle, emotional) and Neocortex (new, rational).
The reptilian brain evolved first and controls the body’s core functions from heart rate to breathing to balance. It’s called the reptilian brain because it includes the brainstem and cerebellum (the main structures found in a reptile’s brain).
The limbic brain came next and includes the hippocampus, the amygdala and the hypothalamus. This is the part of your brain that records memories of behaviors that produced pleasant or unpleasant experiences: it’s responsible for your emotions and value judgements.
The last to evolve, the neocortex is credited with the development of human language, abstract thought, imagination and consciousness. It includes the two large cerebral hemispheres and has almost infinite learning abilities.
So, which of the 3 brains buys?
In classic economic theory, consumers are rational economic actors who make choices after considering all relevant information, using the new brain. While this may well hold true for large purchases, like insurance or a house, recent research has pointed to the power of our older brains in everyday purchase decisions (like buying that pair of Nikes).
Neuroscientist Joseph LeDoux explained “…the wiring of the brain at this point in our evolutionary history is such that connections from the emotional systems to the cognitive systems are stronger than connections from the cognitive systems to the emotional systems.”
LeDoux is suggesting that our brain waves flow from old brain to new brain, meaning our decision-making processes are much less rational than we’d all like to believe.
Moreover, feelings happen before thought and they happen far faster.
We have gut reactions in three seconds or less. In fact, emotions process sensory input in only one-fifth the time our conscious, cognitive brain takes to assimilate that same input. Quick emotional processing also happens with cascading impact. Our emotional reaction to a stimulus resounds more loudly in our brain than does our rational response, triggering the action to follow.
In recent years, the science dubbed neuromarketing has begun to emerge; it “bridges the study of consumer behavior with neuroscience”. The first piece of neuromarketing research was published in Neuron in 2004 by Read Montagne, Professor of Neuroscience at Baylor College of Medicine.
Dr. Montague studied a group of people as they drank either a Pepsi or Coca Cola while their brains were scanned with an fMRI machine. The results suggested that a strong brand (like Coca Cola) could “own” a piece of a person’s frontal cortex.
The brain is responsible for all consumer behaviors…we only use about 20% of our brains consciously. Worse, we do not control the bulk of our attention since we are too busy scanning the environment for potential threats. Because nothing matters more than survival, we are in fact largely controlled by the ancient part of our brain know as the R-complex or the reptilian brain.
Morin goes on to quote neuroscientist Antonio Damasio who said, “We are not thinking machines that feel, we are feeling machines that think.” We are proud of our thinking abilities, but the fact of the matter is, our brains have relied on instinct for millions of years.
Research would suggest that we can optimize our marketing messaging by speaking to consumers’ reptilian brains.
The old brain’s responsiveness to openings and finales
The old brain’s affinity for visuals
The old brain’s responsiveness to emotional persuasion
And we’re back to emotions. To that Nike ad that makes me cry. And then really want some Nikes.
Note: Neuromarketing is not without its critics who voice ethical concerns akin to those that arose in the days of subliminal messaging. There are concerns that this research could lead to manipulation of consumers. It’s up to the marketing community to use this know-how to benefit the consumer first. With great knowledge, comes great responsibility.
System 1 and System 2
Dual-process theory is another cognitive theory about how we make decisions; it originated in the 1970’s and 1980’s and has been developed in more recent years.
The “dual” refers to the 2 cognitive systems we use everyday. In 1999, Professor of Applied Psychology at the University of Toronto, Keith E. Stanovich dubbed the two systems (rather generically) System 1 and System 2 in order to label the 2 different sets of properties. The terminology stuck.
This table showcases clusters of attributes frequently associated with the dual-process theory of higher cognition.
Characteristics to note within the intuitive process are fast, nonconscious, automatic, and experience-based decision making. In other words, our intuitive cognitive system is easier, requiring less focus and energy.
It follows that, if you can tap into your customers’ natural affinity for old brain, system 1 decision making, you’ll most likely see an uptick in conversions.
The level of dominance of each process at a particular time is the key determinant of purchasing decisions. Visitors are more likely to add a product to their cart when the emotional process takes control as they are directed by ‘how it feels’ and not ‘is it worth it.’…Advertising is above all a way to groom the emotional state.
It happens often: during our Explore phase, a client’s users will tell us (via surveys and other forms of qualitative feedback) that they want more information to…well…inform their purchase. Users often vocalize a desire for more description, more specs, presumably so that they can make a rational, thoughtful decision.
We also often have clients who come to us, assuming that their users need more information to make a purchase decision, particularly if their product is technically complex. And yet, time and time again, we test more information against a Control and more information looses.
Get rid of half the words on each page, then get rid of half of what’s left.
Of course, you must take this suggestion with a grain of salt. Your users may, in fact, respond to more information versus less (we’ve seen that too!) but given all of the research that points toward “we buy feelings and rationalize our decisions later” it’s certainly worth testing more concise product descriptions, information hidden behind tabs, etc.
We can’t all be Nike, and Nike’s tactics certainly wouldn’t work for all of us. But when you’re considering your customers’ decision-making, be sure to take into account how you can up the feels.
In his book, You Should Test That!, Chris Goward discusses the “Intangible Benefits” of your Value Proposition. This is where the feelings associated with your brand sit. The question is, how can you highlight these intangibles?
Test video case studies and testimonials against written ones (visuals appeal to the old brain). Test copy that emphasizes your credibility and trustworthiness (alleviate consumer anxiety), test copy that emphasizes social proof (tap into consumer FOMO and yearning for community). Make your users feel: happy, sad, afraid, connected, angry.
Because we don’t buy things. We buy feelings.
How do you make your users feel? How do you emphasize the intangible benefits of your offering? Let us know in the comments!
Just getting those first 100 followers on Instagram can seem like a challenge. But let’s reach for the stars and learn how to get 1,000! Below are some incredibly useful tips for both personal and business Instagram accounts. Would you like to display this infographic on your site? You can copy and paste the HTML code below into your website to display this infographic: ++ Click Image to Enlarge ++Source: Infographic: How to get Your First 1000-Followers on Instagram Sources Instagram Help Center The Science of Instagram: How to Get More Followers and Likes Statistics That Reveal Instagram’s Mind-Blowing Success…
In 2015, Microsoft launched its first new browser in 20 years: Microsoft Edge. After 8 months, it’s on a great trajectory but we’re just getting started. Join us to learn about the progress we’ve made, feedback we’ve heard, and a whirlwind tour of improvements coming soon.
If you are a web developer who cares about quality, most probably you have heard of Selenium and the advantages of using such a tool for test automation. Now, if you are a mobile developer, you might know how much harder it is to test your app due to the existence of different platforms, different OS versions and even variety of devices.
Imagine how great it would be to write your tests only once and run them on different platforms. If so, then maybe today is your lucky day, because I want to tell you about Appium, a tool inspired by the Selenium WebDriver that allows you to write tests against multiple platforms using the same API.