Bayesian A/B Testing
A practical guide to using Bayes' rule, Beta priors, credible intervals, and posterior comparison for A/B tests.
A/B testing is a disciplined way to ask whether a change improved an outcome. You split traffic between two experiences, measure the same success event in each group, and compare the evidence. In a product setting, the success event might be a signup, a checkout, a click, or any other binary conversion.
The frequentist version often asks whether the observed difference would be surprising if there were no real difference. A Bayesian version asks a slightly different and often more practical question:
Given what we believed before the experiment and what we observed during the experiment, what do we believe now?
That question is powered by Bayes' rule:
Read left to right, the posterior belief about a hypothesis after seeing data is proportional to the likelihood of the data under that hypothesis times the prior belief in the hypothesis. The denominator is the normalizing constant that makes the probabilities add up correctly.
In an A/B test, the "hypothesis" is usually not one single claim like "B wins." It is a possible conversion rate. Variant A has an unknown conversion rate , Variant B has an unknown conversion rate , and the experiment helps update our uncertainty about both.
A Practical Bayesian A/B Workflow
Start by defining the estimand. For a binary conversion test, each visitor either converts or does not convert, so the natural target is the conversion probability for each variant:
The data for each variant can be summarized as conversions out of visitors . A binomial likelihood says that if the true conversion rate is , then the probability of observing conversions in visitors is:
Next choose a prior. For conversion rates, the Beta distribution is useful because it lives between and and is conjugate to the binomial likelihood. Conjugate means the posterior stays in the same family as the prior, which makes the update easy to compute and easy to explain.
If the prior is:
and the experiment observes conversions in visitors, the posterior is:
This is the Beta-binomial update. It is also a nice mental model: conversions add to , non-conversions add to . A prior of behaves like a modest starting belief around a conversion rate because its mean is:
Now imagine an experiment with these observations:
| Variant | Conversions | Visitors | Posterior |
|---|---|---|---|
| A | 52 | 420 | |
| B | 72 | 415 |
The posterior mean for each arm is still after the update:
That puts Variant B about percentage points higher than Variant A. But a Bayesian analysis should not stop at the means. The whole posterior distribution matters because it tells us how much uncertainty remains.
A credible interval gives a range of conversion rates that contains a chosen amount of posterior probability. A credible interval can be read directly as "given the model and data, there is a posterior probability that the true conversion rate is in this interval." That interpretation is one reason Bayesian intervals are often easier to communicate than confidence intervals.
The most decision-shaped quantity is often:
You can estimate this by drawing many samples from both posterior distributions and counting how often the sampled value for B is larger than the sampled value for A. In this example, that probability is about . That does not mean Variant B is guaranteed to win forever. It means that under this model, with this prior and this evidence, most posterior comparisons favor B.
Before the test starts, choose the minimum lift worth shipping and the posterior probability needed to act. Otherwise it is too easy to move the goalposts after seeing a friendly chart.
Watching Evidence Become a Posterior
The animation below uses the same numbers from the walkthrough. Run the update and watch the shared prior split into two posterior beliefs after the conversion evidence arrives.
The important thing to notice is not just that B ends higher. Notice that both curves still have width. A Bayesian result is not a single verdict; it is a distribution of plausible conversion rates after combining prior information with observed evidence.
Conclusion
Bayesian A/B testing is useful because it turns experiment results into decision-ready probability statements. You can say what you believed before the test, show how the evidence updated that belief, and compare variants in terms like posterior lift, credible intervals, and .
That does not make Bayesian testing automatically better than every other approach. The prior has to be defensible, the experiment still needs clean randomization, and the success metric still needs to match the product decision. The real advantage is clarity: the method keeps uncertainty visible while making the decision question explicit.