Hypothesis Testing: A systematic way to select samples from a group or population with the intent of making a determination about the expected behavior of the entire group. Part of the field of inferential statistics, hypothesis testing is also known as significance testing, since significance (or lack of same) is usually the bar that determines whether or not the hypothesis is accepted. A hypothesis is similar to a theory If you believe something might be true but don’t yet have definitive proof, it is considered a theory until that proof is provided. Turning theories into accepted statements of fact is…

Correlation: The existence of a relationship between two or more variables or factors where dependence between them occurs in a way that cannot be attributed to chance alone. If an experiment or study is designed to determine which factors might influence other factors of interest, you are testing the correlation between these factors. For example, you may have noticed that men prefer diet cola and women prefer mineral water. Proving this type of correlation allows you to establish a predictive relationship for future behavior. The concept of correlation was first attributed to Sir Charles Galton, a cousin of Charles Darwin,…

A/B Testing, or “Split Testing” as it is also known, can be one of the most useful and powerful tools available for CRO, when used correctly. Without careful planning and analysis, however, the potential benefits of an A/B test may be outweighed by the combined impact of errors, noise and false assumptions. For these reasons, we created The Crazy Egg A/B Test Planning Guide. Our user-friendly guide provides a roadmap through the A/B test planning process. In addition, it serves as a convenient way to record and store your testing history for future review. What is an A/B Test? If…

A type of hypothesis testing where multiple variables are tested simultaneously to determine how the variables and combinations of variables influence the output. If several different variables (factors) are believed to influence the results or output of a test, multivariate testing can be used to test all of these factors at once. Using an organized matrix which includes each potential combination of factors, the effects of each factor on the result can be determined. Also known as Design of Experiments (DOE), the first multivariate testing was performed in 1754 by Scottish physician James Lind as a means of identifying (or…

A term used to describe test methods or algorithms that continuously shift traffic in reaction to the real-time performance of the test. Also known as “multi-armed bandit testing”, the name is derived from the behavior of casino slot machine players who often play several machines at once in order to optimize their payout. Rather than stay with a single machine, the gambler will often play some percentage of the time on several other nearby machines. In this way, the new “hot” machine can be identified without leaving the original machine behind. When used in website testing, bandit testing represents a…

Confidence Interval: A range of values calculated such that there is a known probability that the true mean of a parameter lies within it. The science of statistics is all about predicting results by sampling a portion of a population. Since you can never be 100% certain of that prediction, the result is often expressed as a possible range of values. This range is also known as the confidence interval. For example, you might estimate average body weight based on a random sample of 500 men and 500 women. Your sample results will vary, so you need to add a…

Confidence Level: The percentage of time that a statistical result would be correct if you took numerous random samples. Confidence is often associated with assuredness, and the statistical meaning is closely related to this common usage of the term. To state a percentage value for confidence in something is essentially stating a level of how “sure” you are that it will happen. In statistical terms, it is the expected percentage of time that your range of values will be correct if you were to repeat the same experiment over and over again. Unfortunately, there is no such thing as a…

Margin of Error: An expression for the maximum expected difference between the true population parameter and a sample estimate of that parameter. When you are analyzing a statistical experiment or study and progress from discussing the test sample results to discussing the whole population that the sample represents, there will always be a margin of error attached to any estimated values. The margin of error will be stated with a “plus or minus” (+/-) in front of it, meaning you are just as likely to be above or below your estimated value by the same amount. Despite the word “error”…

Sample Size: The number (n) of observations taken from a population through which statistical inferences for the whole population are made. The concept of sampling from a larger population to determine how that population behaves, or is likely to behave, is one of the basic premises behind the science of applied statistics. For example, if you have a population of 10 million adults, and you sample 10 of them to find out their favorite television show, intuitively you will realize the sample size you have chosen is not large enough to draw valid conclusions. Knowing just how large “n”…

The debating, campaigning and speculating will soon be over. In just a few short weeks, our new President will be elected. Somehow, through this long and arduous process, the parallels between website A/B testing and the election became apparent to me as the campaign continued to grind on. This shouldn’t come as much of a surprise. After all, an election, particularly a Presidential election, has essentially become the ultimate marketing campaign. Of course, one of the objectives of this particular marketing campaign is to make it appear as if it is NOT a marketing campaign…anyone remember “New Coke”? With a…