Bayesian vs Frequentist Statistics: What's the Difference?

Bayesian vs frequentist statistics - the philosophical split that decides whether you get p-values or posteriors, and when each one actually wins.

Data visualisation with scatter plot
Updated
By Rob Griffiths11 June 2026 · 13 min read

Bayesian and frequentist statistics are two answers to the same question: what does "probability" actually mean? Frequentists treat it as the long-run frequency of an event in repeated trials. Bayesians treat it as a degree of belief that can be updated with evidence. From that single philosophical split fall almost every practical difference between the two - p-values versus posterior distributions, confidence intervals versus credible intervals, null-hypothesis tests versus Bayes factors.

The split matters because it changes what your numbers mean. A 95% confidence interval and a 95% credible interval look identical on a chart and answer subtly different questions. Picking the wrong framework for the problem you have produces results that are technically correct and practically misleading.

The Philosophical Split

The disagreement starts with the definition of probability itself.

The frequentist view: probability is the long-run relative frequency of an event in an infinitely repeated experiment. Saying "this coin has a 50% probability of heads" means "if you flip it forever, the fraction of heads converges to 0.5." Under this definition, statements like "there is a 70% probability that the drug works" are nonsense - the drug either works or it does not; the truth is a fixed, unknown parameter, not a random variable. Probabilities only apply to repeatable random events, not to fixed states of the world.

The Bayesian view: probability is a quantified degree of belief, calibrated against evidence. Saying "there is a 70% probability that the drug works" is a statement about what you know, not about what is. The drug's true effect is fixed but unknown; the 70% is your uncertainty about it given the evidence you have. New data updates that belief via Bayes theorem.

Both views are internally consistent. They disagree about which kinds of statements are meaningful, which methods are admissible, and how you should phrase a conclusion. Once you commit to one, the rest of the machinery follows.

What the Two Frameworks Actually Compute

FrequentistBayesian
Probability isLong-run frequency in repeated trialsQuantified degree of belief, updated with evidence
Parameters areFixed unknowns; no probability distributionUncertain quantities with prior + posterior distributions
Data isRandom; treated as samples from a distributionFixed once observed; conditioned on
Outputp-values, confidence intervals, point estimatesPosterior distributions, credible intervals, Bayes factors
Prior knowledgeNot formally incorporatedEncoded explicitly as a prior distribution
Headline questionHow surprising is the data if the null is true?Given the data, what is the probability of each parameter value?
Sample-size logicPower calculation set in advance; results valid only at the chosen stopping pointCan stop and look as often as you like without inflating error rates
Best forRepeatable experiments, regulatory trials, clean A/B testsOne-off decisions, sequential evidence, problems with strong priors

Confidence Intervals vs Credible Intervals

This is the textbook example of two methods producing similar-looking numbers that mean different things.

A 95% frequentist confidence interval is a statement about the procedure, not the parameter. It means: if you ran this experiment many times and constructed an interval the same way each time, 95% of those intervals would contain the true parameter. It does not mean there is a 95% probability that the true parameter is in this particular interval - the parameter is fixed, so any specific interval either contains it or does not, with no probability attached.

A 95% Bayesian credible interval is a statement about the parameter. It means: given the data and the prior, there is a 95% probability that the true parameter lies in this interval. This is the statement most people incorrectly attribute to confidence intervals.

For large samples with weak priors, the two intervals are often nearly identical numerically. They are answering different questions and the interpretive labels matter - confusing them is one of the most common errors in applied statistics.

When Frequentist Methods Win

Frequentism is the right tool when the problem genuinely fits its assumptions:

  • Repeatable experiments with well-defined designs. A pharmaceutical trial with a pre-registered protocol, fixed sample size, and a single primary endpoint is the canonical frequentist case. The long-run frequency interpretation is meaningful because the experiment is, in principle, infinitely repeatable.
  • Regulatory contexts demanding methodological objectivity. The FDA, EMA, MHRA, and similar bodies require frequentist analyses precisely because the prior is not a free parameter the analyst can tune. "Objective" here means "requires no subjective choice of prior," which sidesteps a regulatory headache even when a Bayesian analysis would be more informative.
  • Quality control and process monitoring. Statistical process control charts, defect-rate hypothesis tests, and other repeated-measurement quality work map cleanly onto frequentist assumptions because the process is, by design, a stable repeating system.
  • Cases with truly uninformative priors. When there is genuinely no prior information, the Bayesian answer with a flat prior often matches the frequentist answer numerically. In that limit, frequentism's lack of priors becomes a feature rather than a bug - there is nothing to encode and nothing to defend.
  • Cases where the audience expects p-values. Most published research in medicine, psychology, and economics still uses p-values. If the readership cannot interpret a posterior distribution, the elegant Bayesian analysis loses to the inelegant frequentist one that gets read.

When Bayesian Methods Win

Bayesian methods are the right tool whenever the data are limited, the priors are informative, or the question is not naturally repeatable:

  • One-off decisions. "Should we acquire this company?" is not a repeatable experiment. The frequentist interpretation of probability does not apply - there will only be one outcome, and probabilities about it can only be statements of belief. Bayesian decision theory is the formal tool for exactly this situation.
  • Strong prior information. When the base rate is known - disease prevalence, historical conversion rates, regulatory pass rates - encoding it explicitly as a prior produces more accurate posteriors than ignoring it. Base-rate neglect is so common in applied analysis precisely because frequentist methods give you no syntactic place to put the base rate.
  • Sequential evidence. Frequentist sample-size planning is brittle: peek at the data halfway through and you inflate your false-positive rate. Bayesian posteriors update continuously - you can stop the experiment when the credible interval gets narrow enough, with no methodological penalty.
  • Small samples. Frequentist methods often fall back on asymptotic approximations that need large n to be reliable. Bayesian methods are exact at any sample size; the answer just has wider credible intervals when n is small.
  • Hierarchical models. Multi-level data - patients within hospitals, students within schools, observations within subjects - is much easier to model Bayesianly. Partial pooling falls out of the prior structure naturally, where the frequentist analogue (mixed-effects models) requires more care.
  • Integrating disparate sources. When you have a small clinical trial plus historical data plus expert judgment, Bayesian methods let you combine them as priors-plus-data. Frequentist methods force you to pick one source and treat the rest as background.
  • A/B testing in product environments. Bayesian A/B testing reports "there is a 92% probability variant B is better than variant A" - directly interpretable by product managers. The frequentist equivalent reports a p-value of 0.04 and leaves the actual decision-relevant probability undefined.

Worked Comparison: A Coin That Might Be Biased

To make the difference concrete, take a small, contrived example. A coin is flipped 10 times and lands heads 7 times. Is it biased?

Frequentist analysis. The null hypothesis is that the coin is fair (p = 0.5). Under the null, the probability of seeing 7 or more heads in 10 flips is given by the binomial tail: about 0.172. That is the p-value. At a 5% significance level (p < 0.05), the null is not rejected. The frequentist conclusion: the data are not inconsistent with a fair coin. No probability is attached to the hypothesis itself.

Bayesian analysis. Start with a uniform prior on the coin's bias (Beta(1, 1) - any bias is equally likely a priori). The posterior after observing 7 heads in 10 flips is Beta(8, 4). From that posterior we can compute the probability that the coin is biased toward heads: P(p > 0.5 | data) ≈ 0.89. The 95% credible interval for p is roughly (0.42, 0.89). The Bayesian conclusion: the data raise the probability the coin is biased toward heads to about 89%, but the credible interval is wide because 10 flips is not many.

Same data, two answers. The frequentist answer is a statement about how surprised we should be under the null; the Bayesian answer is a statement about what we believe about the coin. Neither is wrong. They answer different questions.

The Common Misinterpretations

Treating a p-value as the probability the null is true

P(data | null) is not P(null | data). The two are linked by Bayes theorem but are almost never equal. A p-value of 0.04 does not mean there is a 4% chance the null is true - that statement requires a prior, which the p-value framework deliberately excludes.

Calling a 95% confidence interval a 95% probability range

The 95% refers to the procedure, not the parameter. A specific interval either contains the parameter or does not. The credible-interval interpretation people instinctively reach for is the Bayesian one - accidental Bayesianism wearing a frequentist label.

Picking the prior to get the answer you want

Bayesian methods are sometimes accused of being "subjective" because the prior is an input. The defence is to use weakly informative priors by default and to perform sensitivity analyses showing how the posterior changes if the prior is varied. A robust posterior survives across a range of reasonable priors; a fragile one does not, and the fragility itself is informative.

Believing frequentism is prior-free

Frequentist methods make many implicit modelling choices - likelihood functions, link functions, error distributions, stopping rules - that play the same role as priors. The choice is just less visible because it is buried in the model rather than stated explicitly. Neither framework is assumption-free.

Stopping a frequentist experiment early because the result "looks good"

Optional stopping in frequentist NHST inflates the false-positive rate dramatically. Bayesian methods are immune to this because the posterior reflects whatever data you have. Mixing the two cultures - running a frequentist test but peeking like a Bayesian - produces the worst of both worlds.

How to Choose for a Specific Problem

  1. Ask what kind of question you are answering

    A statement about a procedure ("if I ran this 1,000 times, what fraction would reject the null?") is frequentist by construction. A statement about the world ("what is the probability this effect is positive?") is Bayesian by construction. The grammar of the question often dictates the framework.

  2. Ask whether prior information exists and is worth using

    If there is a meaningful base rate, historical data, or expert judgment, encoding it as a prior produces sharper inferences. If there is genuinely nothing prior to encode, the two frameworks tend to agree numerically and frequentism's syntactic simplicity is a fair tiebreaker.

  3. Ask whether the audience can interpret the output

    Posterior probabilities and credible intervals are directly interpretable; p-values and confidence intervals require careful framing. If the reader will reach for the wrong interpretation either way, lean toward whichever framework makes the wrong interpretation accidentally correct.

  4. Ask whether the design allows peeking

    If the experiment is sequential or might be stopped early, Bayesian methods preserve their interpretation; frequentist methods do not. Either commit to a pre-registered sample size and stick to it, or use a Bayesian framework that supports continuous monitoring.

  5. Ask what the regulatory or publishing environment expects

    Pharmaceutical trials, peer-reviewed medical research, and most economics journals still expect frequentist analyses. Tech-product A/B testing, modern machine learning, and bespoke decision support are increasingly Bayesian. Pick the framework that matches the audience, then use the other internally as a sanity check.

Why the Argument Is Less Heated Than It Used to Be

For most of the twentieth century, the Bayesian–frequentist split was an ideological war. Frequentists accused Bayesians of subjectivism; Bayesians accused frequentists of incoherence. Two things have largely defused the conflict.

First, computational advances. Bayesian inference used to be analytically tractable only for a handful of conjugate-prior cases. Markov chain Monte Carlo, variational inference, and modern probabilistic programming languages (Stan, PyMC, NumPyro, Pyro) made it possible to fit Bayesian models for realistic problems on a laptop. Without that, much of applied statistics had no Bayesian option even if a practitioner wanted one.

Second, methodological humility. The replication crisis showed how easy it is to misuse p-values - most spectacularly, by stopping data collection when a test happened to be significant. The Bayesian world is not immune to bad practice, but the explicit prior and the continuous posterior make some of the worst frequentist failure modes harder to commit by accident. The professional consensus is now closer to "use whichever framework fits the problem and the audience" than to either camp's nineteenth-century certainties.

Frequently Asked Questions

Q01What is the main difference between Bayesian and frequentist statistics?
Frequentists define probability as the long-run frequency of an event in repeated trials and treat parameters as fixed unknowns. Bayesians define probability as a degree of belief that can be updated with evidence and treat parameters as having their own distributions. Almost every methodological difference between the two flows from that single split.
Q02Is Bayesian or frequentist better?
Neither is universally better. Frequentist methods are well-suited to repeatable experiments with clean designs and regulatory contexts that require methodological objectivity. Bayesian methods are well-suited to one-off decisions, problems with strong prior information, and situations involving sequential evidence.
Q03What is a p-value, and what does it actually mean?
A p-value is the probability of observing data at least as extreme as the actual data, assuming the null hypothesis is true. It is not the probability the null hypothesis is true, and it is not the probability of getting the same result if the experiment were repeated. Most popular interpretations of p-values are technically wrong.
Q04What is the difference between a confidence interval and a credible interval?
A confidence interval is a frequentist construct: if you repeated the experiment many times, 95% of intervals constructed the same way would contain the true parameter. A credible interval is a Bayesian construct: given the data and prior, there is a 95% probability the true parameter lies in this specific interval. They look similar numerically but answer different questions.
Q05Why do regulators usually require frequentist analyses?
Because the prior in a Bayesian analysis is an input the analyst chooses, regulators worry it can be tuned to favour a desired result. Frequentist methods sidestep this by not having a prior at all - at the cost of being unable to formally incorporate genuine prior information. The trade-off is methodological objectivity in exchange for sometimes-suboptimal inference.
Q06Can Bayesian methods peek at data without inflating error rates?
Yes. Bayesian posteriors reflect whatever data has been observed so far; there is no notion of "alpha spending" or "sample size set in advance." This makes Bayesian methods natural for sequential trials, adaptive A/B testing, and any situation where stopping rules are flexible. The frequentist equivalent requires careful pre-registration and stopping-rule design.
Q07Do Bayesian and frequentist methods ever give the same answer?
Often, yes - particularly with large samples and weakly informative priors. The numerical answer converges; the interpretation does not. A 95% confidence interval and a 95% credible interval at large n might be the same two numbers and still mean different things.

Further Reading on This Site