Correlation vs Causation: A Probabilistic Thinking Guide

Why your brain confuses the two - and how to think clearly about cause and effect.

Scatter plot showing two correlated variables, with a question mark over the arrow that would imply causation.

Updated 24 July 2026

By Rob Griffiths24 July 2026 · 14 min read

Correlation is when two things move together. Causation is when one of them is the reason the other moves. They look identical from the outside, and that's why almost every confidently held belief about cause and effect is wrong.

Ice cream sales and drowning deaths are tightly correlated. So are the number of pirates in the world and global temperature. So are nations that consume more chocolate and the number of Nobel laureates they produce. None of these is a causal relationship - but if you only had the data and a hunch, you'd swear they were.

This guide is about how to spot the difference. We'll cover spurious correlations and why they're so common, Simpson's paradox (where a trend reverses when you split the data), confounding variables, and the actual methods researchers use to establish causation. By the end, you'll have a working mental toolkit for separating “A and B happen together” from “A causes B.”

What does correlation actually mean?

A statistical relationship - nothing more

A correlation is a measurable, repeatable association between two variables. When one moves, the other tends to move with it (positive correlation) or against it (negative correlation). Statisticians quantify this with a correlation coefficient - usually labelled r - that ranges from -1 to +1.

r = +1: perfect positive correlation. As one variable rises, the other rises in lockstep.
r = 0: no linear relationship.
r = -1: perfect negative correlation. As one rises, the other falls in lockstep.

Real-world correlations almost never hit ±1. A coefficient of 0.7 is considered very strong; 0.4 is moderate; 0.1 is barely there. But here's the catch: even a perfect correlation tells you absolutely nothing about why the variables move together. Correlation is descriptive. It says “this pattern exists.” It does not say what causes the pattern.

Two variables can correlate strongly because:

One actually causes the other (real causation)
The relationship runs the other way around (reverse causation)
Both are caused by a third, hidden variable (confounding)
The pattern is a coincidence in your particular sample (spurious correlation)
You're looking at a non-random subset of the data (selection effects)

Of those five, only the first is genuine causation. The other four account for the overwhelming majority of correlations you'll encounter in life, science, and journalism.

What is a spurious correlation?

Patterns that exist only because the universe is large

Tyler Vigen's Spurious Correlations project compiled hundreds of these. A few favourites:

The number of films Nicolas Cage appeared in correlates 0.66 with the number of people who drowned by falling into a pool.
US per-capita cheese consumption correlates 0.95 with the number of people who died by becoming tangled in their bedsheets.
The divorce rate in Maine correlates 0.99 with per-capita margarine consumption.

None of these are causal. They're statistical noise - patterns that emerge when you trawl enough variables looking for matches. With sufficient data, you can find a strong correlation between almost any two unrelated time series, especially if both happen to be trending in the same direction over the period you measure.

This is the first lesson of correlation vs causation: the existence of a correlation, on its own, is not strong evidence of anything causal. It's a starting hypothesis at best.

What are the most famous spurious correlations?

The Tyler Vigen archive and decades of statistical writing have surfaced specific examples that make the principle vivid:

Ice cream sales + drowning deaths (the classic). Both peak in summer because of warm weather (the confounder). Eating ice cream doesn't cause drowning.
US per-capita cheese consumption + civil-engineering doctorates awarded (correlation coefficient 0.96 over 9 years, per Tyler Vigen). Both are slow-moving trends with no plausible causal link - pure coincidence over a small time window.
Margarine consumption + Maine divorce rate (0.99 over 10 years per Tyler Vigen). Same pattern - two declining/changing time series that happen to track.
Autism diagnoses + organic food sales (rose together 2000-2015). Both correlate strongly but share confounders (rising parental awareness, economic shifts, demographic changes). The widely-circulated implication of a causal link is false.
Number of pirates + global temperature (the parody example from Bobby Henderson's open letter to the Kansas school board). The point: any two declining or rising trends will correlate, regardless of causal connection.
Storks + birth rates in European villages (~0.62 correlation in 17th-century data). Both correlated with rural village size - more rural means more storks AND more births (the confounder is village type, not the stork).
Country chocolate consumption + Nobel Prize winners per capita (a 2012 New England Journal of Medicine humour piece reporting r=0.79). Confounders include national wealth, education spending, and research infrastructure. Not a chocolate-causes-Nobels finding.
Shoe size + reading ability in primary schoolchildren (strongly correlated). The confounder is age - older children have bigger feet AND read better. Reading ability isn't caused by foot size.
HDL cholesterol + cardiovascular health (long thought causal; failed in RCT). For decades, HDL was assumed to causally protect against heart disease because the correlation was so strong. When pharma developed drugs that raised HDL specifically (CETP inhibitors), the RCTs showed no reduction in cardiovascular events. The correlation existed; the causation didn't.
Hormone replacement therapy + reduced heart disease (observational vs RCT contradiction). 1990s observational studies showed HRT users had lower heart-disease rates. The 2002 Women's Health Initiative RCT showed HRT actually INCREASED heart-disease risk. The confounder was that HRT users tended to be healthier overall (selection effect). One of the most-cited cautionary tales in modern epidemiology.

The pattern across all ten: any two variables that move together over a finite time window can produce a spectacular-looking correlation coefficient. Whether they're causally linked is a separate question that requires more than the correlation evidence to answer.

What is a confounding variable?

When something else is driving both

Ice cream sales and drowning deaths are correlated - both go up in the summer. A naive analysis might suggest that ice cream causes drowning, or that drowning causes ice cream sales. Obviously neither is true. The real cause is a third variable: hot weather. Hot days drive both ice cream consumption and the likelihood of swimming in the first place.

This third variable is called a confounder. It causes both observed variables, creating a correlation between them that has nothing to do with one causing the other.

Confounders are everywhere in observational data:

Coffee drinkers live longer than non-drinkers

Confounder: people who can afford and tolerate coffee tend to be healthier overall.

Children with bigger feet have better vocabularies

Confounder: age. Older kids have bigger feet and bigger vocabularies.

Universities with smaller class sizes produce better grades

Confounder: smaller classes are often at wealthier institutions with better-prepared students.

Vitamin supplement users are healthier than non-users

Confounder: people who buy supplements are also more likely to exercise, sleep well, and avoid smoking.

The presence of a plausible confounder doesn't disprove a causal claim - but it forces you to control for it before taking the correlation seriously. Good observational research lists every plausible confounder and adjusts for it statistically. Bad research ignores them and reports the headline number.

When does reverse causation flip the answer?

B causes A, not A causes B

Sometimes two things really are causally linked, but the direction is the opposite of what intuition suggests.

People who use stand-up desks report better focus. Does the desk cause better focus? Possibly. But it's at least as plausible that people who already have good focus and discipline are more likely to buy stand-up desks in the first place. Without a controlled experiment, you genuinely can't tell which way the arrow goes.

Other classic reverse-causation traps:

People who attend therapy report worse mental health than people who don't. Does therapy cause poor mental health, or do struggling people seek therapy?
Hospitals have higher death rates than gyms. Do hospitals kill people, or do dying people end up in hospitals?
Successful CEOs tend to take cold showers. Does cold showering build success, or do successful people buy into wellness fads?

Reverse causation is particularly sneaky because it produces real correlations and feels like a causal mechanism. The only reliable defence is to ask: “If I imagine the arrow going the other way, does that story also fit the evidence?” If yes, you don't have a causal claim.

What is Simpson's paradox?

A pattern that reverses when you split the data

Simpson's paradox is one of the strangest things in statistics. It's a situation where a trend appears in several groups of data but reverses when the groups are combined - or vice versa.

The classic example is from a 1973 admissions audit at UC Berkeley. Looking at overall admissions, men were admitted at a noticeably higher rate than women, suggesting bias against women. But when researchers split the data by department, the pattern flipped: most departments were actually slightly biased in favour of women. The aggregate looked discriminatory because women happened to apply more often to highly competitive departments with low acceptance rates for everyone.

The same paradox appears repeatedly:

Treatment A looks better than Treatment B overall

But Treatment B is better in every patient subgroup. The difference comes from how patients were assigned.

A baseball player has a higher batting average each year than another

But over a career, the second player has the higher average. The mix of years explains the reversal.

A school's average test score drops year-on-year

But every demographic subgroup is improving. The composition of the student body changed.

Simpson's paradox is the strongest argument for never trusting an aggregate statistic without seeing the breakdown. The lesson is simple: a correlation that holds at one level can vanish or invert at another. Always ask, “What groups are inside this average, and what happens when I look at them separately?”

How do researchers actually establish causation?

Beyond correlation - the gold standards

Establishing genuine causation is hard. It typically requires one of these approaches, in roughly increasing order of reliability:

Bradford Hill criteria
A checklist for evaluating observational evidence: strength of association, consistency across studies, specificity, temporal sequence (cause precedes effect), dose-response relationship, biological plausibility. Used in epidemiology when experiments aren't possible (smoking and lung cancer was established this way).
Natural experiments
Situations where some external factor (a policy change, a natural disaster, a lottery) randomly assigns people to different conditions. Researchers can then study the consequences as if it were a controlled experiment, even though no one designed it.
Instrumental variables
A statistical technique that uses a third variable correlated with the cause but not directly with the outcome to isolate the causal effect. Common in economics - for example, using rainfall to study how farm income affects local school attendance.
Regression discontinuity
When eligibility for an intervention depends on crossing a threshold (a test score, an income cutoff), comparing people just above and just below the line approximates a randomised experiment.
Randomised controlled trials (RCTs)
The gold standard. Randomly assign people to treatment or control, measure the outcome. Random assignment eliminates confounders by construction. Used in medicine, increasingly in policy and tech (A/B testing is essentially this).

If you're reading a study that claims X causes Y and none of these methods is described - just a correlation in observational data - be sceptical. The study may still be useful as a hypothesis generator, but it almost certainly hasn't proved causation.

How can you spot correlation-vs-causation errors in real life?

Five questions to ask before believing any causal claim

You're not going to run an RCT next time you read a news headline. But you can ask these five questions, and they'll filter out most weak causal claims:

Is there a plausible confounder?
What third variable could be driving both the cause and the effect? If you can think of one easily, the claim probably hasn't accounted for it.
Could the arrow point the other way?
Try the reverse causation story. Does it also fit the evidence? If yes, you can't tell direction from this data alone.
Is the effect size suspiciously large?
Tiny interventions rarely produce huge outcomes. If a single weekly habit allegedly doubles your income, lifespan, or happiness, the claim is almost certainly oversold.
Where does the data come from?
Self-reports, social media, online surveys, observational studies - all are vulnerable to selection bias. Random samples and intervention studies are far more trustworthy.
Has it been replicated?
A single study, no matter how dramatic, is rarely conclusive. Wait for replication, especially in psychology, nutrition, and social science where reproducibility rates are low.

Frequently Asked Questions

Q01Can correlation ever prove causation?

On its own, no. A correlation can be perfectly consistent with a causal relationship, but it's also consistent with confounding, reverse causation, coincidence, and selection effects. Causation requires either a controlled experiment or a careful argument that rules out the alternatives.

Q02If correlation isn't causation, why do scientists use observational studies at all?

Because experiments are often impossible - you can't randomly assign people to smoke for 30 years or live in different countries. Observational studies are how we generate hypotheses and gather circumstantial evidence. The Bradford Hill criteria exist precisely to evaluate this kind of evidence systematically when randomised trials aren't an option.

Q03How strong does a correlation need to be before it's worth taking seriously?

Strength matters less than mechanism. A correlation of 0.3 with a clear causal pathway and replication across studies is more meaningful than 0.9 with no plausible mechanism. That said, weak correlations (under ~0.2) in noisy real-world data are often just statistical artefacts.

Q04What's the difference between confounding and Simpson's paradox?

Confounding is a hidden cause of both variables that creates a misleading correlation. Simpson's paradox is what happens when a confounder is so strong that it actually reverses the apparent direction of an effect when you slice the data differently. Simpson's is a special, dramatic case of confounding.

Q05How does this apply to investing?

Strongly. Most market “edges” that look like causation in backtests are spurious correlations from a small sample, confounded by overall market direction, or the result of survivorship bias. See our expected value and probabilistic thinking guides for how to evaluate investment claims more rigorously.

Q06Are A/B tests the same as randomised controlled trials?

Yes, in the statistical sense. A well-designed A/B test randomly assigns users to variants and measures the outcome - exactly the same logic as an RCT. They're the most reliable causal tool most product teams have. The catch is that they only work for short-term effects on the variables you measure; long-term and second-order effects are still observational.

The Probabilistic Bottom Line

Default scepticism, calibrated belief

Probabilistic thinkers don't reject every correlation - they treat correlations as evidence with a weight that depends on the design behind them. A randomised trial moves the needle a lot. A natural experiment moves it some. A messy observational study with obvious confounders barely moves it at all.

Most causal claims you'll meet - in the news, on social media, in management books, in folk wisdom - are based on the weakest type of evidence. That doesn't make them all wrong. It does mean you should hold them loosely. Be willing to update if better evidence arrives, and be willing to discard them if the underlying study was a fishing expedition.

The goal isn't to become a sceptic of everything. It's to allocate your belief in proportion to the actual strength of the evidence - which is exactly what probabilistic thinking is for.

Continue the Series

New to probabilistic thinking?

Start with our foundational series on expected value, base rates, and decision-making under uncertainty.

Start with Expected Value

Correlation vs Causation: A Probabilistic Thinking Guide

What does correlation actually mean?

What is a spurious correlation?

What are the most famous spurious correlations?

What is a confounding variable?

Coffee drinkers live longer than non-drinkers

Children with bigger feet have better vocabularies

Universities with smaller class sizes produce better grades

Vitamin supplement users are healthier than non-users

When does reverse causation flip the answer?

What is Simpson's paradox?

Treatment A looks better than Treatment B overall

A baseball player has a higher batting average each year than another

A school's average test score drops year-on-year

How do researchers actually establish causation?

Bradford Hill criteria

Natural experiments

Instrumental variables

Regression discontinuity

Randomised controlled trials (RCTs)

How can you spot correlation-vs-causation errors in real life?

Is there a plausible confounder?

Could the arrow point the other way?

Is the effect size suspiciously large?

Where does the data come from?

Has it been replicated?

Frequently Asked Questions

The Probabilistic Bottom Line

Continue the Series

Base Rate Neglect

Bayesian Thinking

Thinking in Probabilities

New to probabilistic thinking?