Correlation vs Causation: A Probabilistic Thinking Guide
Correlation is not causation. It's the most-quoted line in statistics, and the most often misunderstood. Here's what it actually means, why your brain confuses the two, and how to think clearly about cause and effect.
Correlation vs Causation: A Probabilistic Thinking Guide
Why your brain confuses the two — and how to think clearly about cause and effect.
Correlation is when two things move together. Causation is when one of them is the reason the other moves. They look identical from the outside, and that's why almost every confidently held belief about cause and effect is wrong.
Ice cream sales and drowning deaths are tightly correlated. So are the number of pirates in the world and global temperature. So are nations that consume more chocolate and the number of Nobel laureates they produce. None of these is a causal relationship — but if you only had the data and a hunch, you'd swear they were.
This guide is about how to spot the difference. We'll cover spurious correlations and why they're so common, Simpson's paradox (where a trend reverses when you split the data), confounding variables, and the actual methods researchers use to establish causation. By the end, you'll have a working mental toolkit for separating “A and B happen together” from “A causes B.”
What Correlation Actually Means
A statistical relationship — nothing more
A correlation is a measurable, repeatable association between two variables. When one moves, the other tends to move with it (positive correlation) or against it (negative correlation). Statisticians quantify this with a correlation coefficient — usually labelled r — that ranges from -1 to +1.
- r = +1: perfect positive correlation. As one variable rises, the other rises in lockstep.
- r = 0: no linear relationship.
- r = -1: perfect negative correlation. As one rises, the other falls in lockstep.
Real-world correlations almost never hit ±1. A coefficient of 0.7 is considered very strong; 0.4 is moderate; 0.1 is barely there. But here's the catch: even a perfect correlation tells you absolutely nothing about why the variables move together. Correlation is descriptive. It says “this pattern exists.” It does not say what causes the pattern.
Two variables can correlate strongly because:
- One actually causes the other (real causation)
- The relationship runs the other way around (reverse causation)
- Both are caused by a third, hidden variable (confounding)
- The pattern is a coincidence in your particular sample (spurious correlation)
- You're looking at a non-random subset of the data (selection effects)
Of those five, only the first is genuine causation. The other four account for the overwhelming majority of correlations you'll encounter in life, science, and journalism.
Spurious Correlations: When Coincidence Looks Like Truth
Patterns that exist only because the universe is large
Tyler Vigen's Spurious Correlations project compiled hundreds of these. A few favourites:
- The number of films Nicolas Cage appeared in correlates 0.66 with the number of people who drowned by falling into a pool.
- US per-capita cheese consumption correlates 0.95 with the number of people who died by becoming tangled in their bedsheets.
- The divorce rate in Maine correlates 0.99 with per-capita margarine consumption.
None of these are causal. They're statistical noise — patterns that emerge when you trawl enough variables looking for matches. With sufficient data, you can find a strong correlation between almost any two unrelated time series, especially if both happen to be trending in the same direction over the period you measure.
This is the first lesson of correlation vs causation: the existence of a correlation, on its own, is not strong evidence of anything causal. It's a starting hypothesis at best.
Confounding Variables: The Hidden Third Cause
When something else is driving both
Ice cream sales and drowning deaths are correlated — both go up in the summer. A naive analysis might suggest that ice cream causes drowning, or that drowning causes ice cream sales. Obviously neither is true. The real cause is a third variable: hot weather. Hot days drive both ice cream consumption and the likelihood of swimming in the first place.
This third variable is called a confounder. It causes both observed variables, creating a correlation between them that has nothing to do with one causing the other.
Confounders are everywhere in observational data:
Confounder: people who can afford and tolerate coffee tend to be healthier overall.
Confounder: age. Older kids have bigger feet and bigger vocabularies.
Confounder: smaller classes are often at wealthier institutions with better-prepared students.
Confounder: people who buy supplements are also more likely to exercise, sleep well, and avoid smoking.
The presence of a plausible confounder doesn't disprove a causal claim — but it forces you to control for it before taking the correlation seriously. Good observational research lists every plausible confounder and adjusts for it statistically. Bad research ignores them and reports the headline number.
Reverse Causation: When the Arrow Points the Other Way
B causes A, not A causes B
Sometimes two things really are causally linked, but the direction is the opposite of what intuition suggests.
People who use stand-up desks report better focus. Does the desk cause better focus? Possibly. But it's at least as plausible that people who already have good focus and discipline are more likely to buy stand-up desks in the first place. Without a controlled experiment, you genuinely can't tell which way the arrow goes.
Other classic reverse-causation traps:
- People who attend therapy report worse mental health than people who don't. Does therapy cause poor mental health, or do struggling people seek therapy?
- Hospitals have higher death rates than gyms. Do hospitals kill people, or do dying people end up in hospitals?
- Successful CEOs tend to take cold showers. Does cold showering build success, or do successful people buy into wellness fads?
Reverse causation is particularly sneaky because it produces real correlations and feels like a causal mechanism. The only reliable defence is to ask: “If I imagine the arrow going the other way, does that story also fit the evidence?” If yes, you don't have a causal claim.
Simpson's Paradox: When the Data Lies in Both Directions
A pattern that reverses when you split the data
Simpson's paradox is one of the strangest things in statistics. It's a situation where a trend appears in several groups of data but reverses when the groups are combined — or vice versa.
The classic example is from a 1973 admissions audit at UC Berkeley. Looking at overall admissions, men were admitted at a noticeably higher rate than women, suggesting bias against women. But when researchers split the data by department, the pattern flipped: most departments were actually slightly biased in favour of women. The aggregate looked discriminatory because women happened to apply more often to highly competitive departments with low acceptance rates for everyone.
The same paradox appears repeatedly:
But Treatment B is better in every patient subgroup. The difference comes from how patients were assigned.
But over a career, the second player has the higher average. The mix of years explains the reversal.
But every demographic subgroup is improving. The composition of the student body changed.
Simpson's paradox is the strongest argument for never trusting an aggregate statistic without seeing the breakdown. The lesson is simple: a correlation that holds at one level can vanish or invert at another. Always ask, “What groups are inside this average, and what happens when I look at them separately?”
How Researchers Actually Establish Causation
Beyond correlation — the gold standards
Establishing genuine causation is hard. It typically requires one of these approaches, in roughly increasing order of reliability:
Bradford Hill criteria
A checklist for evaluating observational evidence: strength of association, consistency across studies, specificity, temporal sequence (cause precedes effect), dose-response relationship, biological plausibility. Used in epidemiology when experiments aren't possible (smoking and lung cancer was established this way).
Natural experiments
Situations where some external factor (a policy change, a natural disaster, a lottery) randomly assigns people to different conditions. Researchers can then study the consequences as if it were a controlled experiment, even though no one designed it.
Instrumental variables
A statistical technique that uses a third variable correlated with the cause but not directly with the outcome to isolate the causal effect. Common in economics — for example, using rainfall to study how farm income affects local school attendance.
Regression discontinuity
When eligibility for an intervention depends on crossing a threshold (a test score, an income cutoff), comparing people just above and just below the line approximates a randomised experiment.
Randomised controlled trials (RCTs)
The gold standard. Randomly assign people to treatment or control, measure the outcome. Random assignment eliminates confounders by construction. Used in medicine, increasingly in policy and tech (A/B testing is essentially this).
If you're reading a study that claims X causes Y and none of these methods is described — just a correlation in observational data — be sceptical. The study may still be useful as a hypothesis generator, but it almost certainly hasn't proved causation.
A Mental Checklist for Real Life
Five questions to ask before believing any causal claim
You're not going to run an RCT next time you read a news headline. But you can ask these five questions, and they'll filter out most weak causal claims:
Is there a plausible confounder?
What third variable could be driving both the cause and the effect? If you can think of one easily, the claim probably hasn't accounted for it.
Could the arrow point the other way?
Try the reverse causation story. Does it also fit the evidence? If yes, you can't tell direction from this data alone.
Is the effect size suspiciously large?
Tiny interventions rarely produce huge outcomes. If a single weekly habit allegedly doubles your income, lifespan, or happiness, the claim is almost certainly oversold.
Where does the data come from?
Self-reports, social media, online surveys, observational studies — all are vulnerable to selection bias. Random samples and intervention studies are far more trustworthy.
Has it been replicated?
A single study, no matter how dramatic, is rarely conclusive. Wait for replication, especially in psychology, nutrition, and social science where reproducibility rates are low.
Frequently Asked Questions
Can correlation ever prove causation?
If correlation isn't causation, why do scientists use observational studies at all?
How strong does a correlation need to be before it's worth taking seriously?
What's the difference between confounding and Simpson's paradox?
How does this apply to investing?
Are A/B tests the same as randomised controlled trials?
The Probabilistic Bottom Line
Default scepticism, calibrated belief
Probabilistic thinkers don't reject every correlation — they treat correlations as evidence with a weight that depends on the design behind them. A randomised trial moves the needle a lot. A natural experiment moves it some. A messy observational study with obvious confounders barely moves it at all.
Most causal claims you'll meet — in the news, on social media, in management books, in folk wisdom — are based on the weakest type of evidence. That doesn't make them all wrong. It does mean you should hold them loosely. Be willing to update if better evidence arrives, and be willing to discard them if the underlying study was a fishing expedition.
The goal isn't to become a sceptic of everything. It's to allocate your belief in proportion to the actual strength of the evidence — which is exactly what probabilistic thinking is for.
Continue the Series
Base Rate Neglect
Why your intuitions about probability are systematically wrong, and how to fix them.
Read the guideBayesian Thinking
How to update your beliefs the right amount when you get new evidence.
Learn BayesianThinking in Probabilities
Why your brain is bad at risk — and how to get better at it.
Read moreNew to probabilistic thinking?
Start with our foundational series on expected value, base rates, and decision-making under uncertainty.