Correlation vs Causation: A Probabilistic Thinking Guide
Why your brain confuses the two - and how to think clearly about cause and effect.

Correlation is when two things move together. Causation is when one of them is the reason the other moves. They look identical from the outside, and that's why almost every confidently held belief about cause and effect is wrong.
Ice cream sales and drowning deaths are tightly correlated. So are the number of pirates in the world and global temperature. So are nations that consume more chocolate and the number of Nobel laureates they produce. None of these is a causal relationship - but if you only had the data and a hunch, you'd swear they were.
This guide is about how to spot the difference. We'll cover spurious correlations and why they're so common, Simpson's paradox (where a trend reverses when you split the data), confounding variables, and the actual methods researchers use to establish causation. By the end, you'll have a working mental toolkit for separating “A and B happen together” from “A causes B.”
What does correlation actually mean?
A statistical relationship - nothing more
A correlation is a measurable, repeatable association between two variables. When one moves, the other tends to move with it (positive correlation) or against it (negative correlation). Statisticians quantify this with a correlation coefficient - usually labelled r - that ranges from -1 to +1.
- r = +1: perfect positive correlation. As one variable rises, the other rises in lockstep.
- r = 0: no linear relationship.
- r = -1: perfect negative correlation. As one rises, the other falls in lockstep.
Real-world correlations almost never hit ±1. A coefficient of 0.7 is considered very strong; 0.4 is moderate; 0.1 is barely there. But here's the catch: even a perfect correlation tells you absolutely nothing about why the variables move together. Correlation is descriptive. It says “this pattern exists.” It does not say what causes the pattern.
Two variables can correlate strongly because:
- One actually causes the other (real causation)
- The relationship runs the other way around (reverse causation)
- Both are caused by a third, hidden variable (confounding)
- The pattern is a coincidence in your particular sample (spurious correlation)
- You're looking at a non-random subset of the data (selection effects)
Of those five, only the first is genuine causation. The other four account for the overwhelming majority of correlations you'll encounter in life, science, and journalism.
What is a spurious correlation?
Patterns that exist only because the universe is large
Tyler Vigen's Spurious Correlations project compiled hundreds of these. A few favourites:
- The number of films Nicolas Cage appeared in correlates 0.66 with the number of people who drowned by falling into a pool.
- US per-capita cheese consumption correlates 0.95 with the number of people who died by becoming tangled in their bedsheets.
- The divorce rate in Maine correlates 0.99 with per-capita margarine consumption.
None of these are causal. They're statistical noise - patterns that emerge when you trawl enough variables looking for matches. With sufficient data, you can find a strong correlation between almost any two unrelated time series, especially if both happen to be trending in the same direction over the period you measure.
This is the first lesson of correlation vs causation: the existence of a correlation, on its own, is not strong evidence of anything causal. It's a starting hypothesis at best.
What are the most famous spurious correlations?
The Tyler Vigen archive and decades of statistical writing have surfaced specific examples that make the principle vivid:
- Ice cream sales + drowning deaths (the classic). Both peak in summer because of warm weather (the confounder). Eating ice cream doesn't cause drowning.
- US per-capita cheese consumption + civil-engineering doctorates awarded (correlation coefficient 0.96 over 9 years, per Tyler Vigen). Both are slow-moving trends with no plausible causal link - pure coincidence over a small time window.
- Margarine consumption + Maine divorce rate (0.99 over 10 years per Tyler Vigen). Same pattern - two declining/changing time series that happen to track.
- Autism diagnoses + organic food sales (rose together 2000-2015). Both correlate strongly but share confounders (rising parental awareness, economic shifts, demographic changes). The widely-circulated implication of a causal link is false.
- Number of pirates + global temperature (the parody example from Bobby Henderson's open letter to the Kansas school board). The point: any two declining or rising trends will correlate, regardless of causal connection.
- Storks + birth rates in European villages (~0.62 correlation in 17th-century data). Both correlated with rural village size - more rural means more storks AND more births (the confounder is village type, not the stork).
- Country chocolate consumption + Nobel Prize winners per capita (a 2012 New England Journal of Medicine humour piece reporting r=0.79). Confounders include national wealth, education spending, and research infrastructure. Not a chocolate-causes-Nobels finding.
- Shoe size + reading ability in primary schoolchildren (strongly correlated). The confounder is age - older children have bigger feet AND read better. Reading ability isn't caused by foot size.
- HDL cholesterol + cardiovascular health (long thought causal; failed in RCT). For decades, HDL was assumed to causally protect against heart disease because the correlation was so strong. When pharma developed drugs that raised HDL specifically (CETP inhibitors), the RCTs showed no reduction in cardiovascular events. The correlation existed; the causation didn't.
- Hormone replacement therapy + reduced heart disease (observational vs RCT contradiction). 1990s observational studies showed HRT users had lower heart-disease rates. The 2002 Women's Health Initiative RCT showed HRT actually INCREASED heart-disease risk. The confounder was that HRT users tended to be healthier overall (selection effect). One of the most-cited cautionary tales in modern epidemiology.
The pattern across all ten: any two variables that move together over a finite time window can produce a spectacular-looking correlation coefficient. Whether they're causally linked is a separate question that requires more than the correlation evidence to answer.
What is a confounding variable?
When something else is driving both
Ice cream sales and drowning deaths are correlated - both go up in the summer. A naive analysis might suggest that ice cream causes drowning, or that drowning causes ice cream sales. Obviously neither is true. The real cause is a third variable: hot weather. Hot days drive both ice cream consumption and the likelihood of swimming in the first place.
This third variable is called a confounder. It causes both observed variables, creating a correlation between them that has nothing to do with one causing the other.
Confounders are everywhere in observational data:
Coffee drinkers live longer than non-drinkers
Confounder: people who can afford and tolerate coffee tend to be healthier overall.
Children with bigger feet have better vocabularies
Confounder: age. Older kids have bigger feet and bigger vocabularies.
Universities with smaller class sizes produce better grades
Confounder: smaller classes are often at wealthier institutions with better-prepared students.
Vitamin supplement users are healthier than non-users
Confounder: people who buy supplements are also more likely to exercise, sleep well, and avoid smoking.
The presence of a plausible confounder doesn't disprove a causal claim - but it forces you to control for it before taking the correlation seriously. Good observational research lists every plausible confounder and adjusts for it statistically. Bad research ignores them and reports the headline number.
When does reverse causation flip the answer?
B causes A, not A causes B
Sometimes two things really are causally linked, but the direction is the opposite of what intuition suggests.
People who use stand-up desks report better focus. Does the desk cause better focus? Possibly. But it's at least as plausible that people who already have good focus and discipline are more likely to buy stand-up desks in the first place. Without a controlled experiment, you genuinely can't tell which way the arrow goes.
Other classic reverse-causation traps:
- People who attend therapy report worse mental health than people who don't. Does therapy cause poor mental health, or do struggling people seek therapy?
- Hospitals have higher death rates than gyms. Do hospitals kill people, or do dying people end up in hospitals?
- Successful CEOs tend to take cold showers. Does cold showering build success, or do successful people buy into wellness fads?
Reverse causation is particularly sneaky because it produces real correlations and feels like a causal mechanism. The only reliable defence is to ask: “If I imagine the arrow going the other way, does that story also fit the evidence?” If yes, you don't have a causal claim.
What is Simpson's paradox?
A pattern that reverses when you split the data
Simpson's paradox is one of the strangest things in statistics. It's a situation where a trend appears in several groups of data but reverses when the groups are combined - or vice versa.
The classic example is from a 1973 admissions audit at UC Berkeley. Looking at overall admissions, men were admitted at a noticeably higher rate than women, suggesting bias against women. But when researchers split the data by department, the pattern flipped: most departments were actually slightly biased in favour of women. The aggregate looked discriminatory because women happened to apply more often to highly competitive departments with low acceptance rates for everyone.
The same paradox appears repeatedly:
Treatment A looks better than Treatment B overall
But Treatment B is better in every patient subgroup. The difference comes from how patients were assigned.
A baseball player has a higher batting average each year than another
But over a career, the second player has the higher average. The mix of years explains the reversal.
A school's average test score drops year-on-year
But every demographic subgroup is improving. The composition of the student body changed.
Simpson's paradox is the strongest argument for never trusting an aggregate statistic without seeing the breakdown. The lesson is simple: a correlation that holds at one level can vanish or invert at another. Always ask, “What groups are inside this average, and what happens when I look at them separately?”
How do researchers actually establish causation?
Beyond correlation - the gold standards
Establishing genuine causation is hard. It typically requires one of these approaches, in roughly increasing order of reliability:
Bradford Hill criteria
A checklist for evaluating observational evidence: strength of association, consistency across studies, specificity, temporal sequence (cause precedes effect), dose-response relationship, biological plausibility. Used in epidemiology when experiments aren't possible (smoking and lung cancer was established this way).
Natural experiments
Situations where some external factor (a policy change, a natural disaster, a lottery) randomly assigns people to different conditions. Researchers can then study the consequences as if it were a controlled experiment, even though no one designed it.
Instrumental variables
A statistical technique that uses a third variable correlated with the cause but not directly with the outcome to isolate the causal effect. Common in economics - for example, using rainfall to study how farm income affects local school attendance.
Regression discontinuity
When eligibility for an intervention depends on crossing a threshold (a test score, an income cutoff), comparing people just above and just below the line approximates a randomised experiment.
Randomised controlled trials (RCTs)
The gold standard. Randomly assign people to treatment or control, measure the outcome. Random assignment eliminates confounders by construction. Used in medicine, increasingly in policy and tech (A/B testing is essentially this).
If you're reading a study that claims X causes Y and none of these methods is described - just a correlation in observational data - be sceptical. The study may still be useful as a hypothesis generator, but it almost certainly hasn't proved causation.
How can you spot correlation-vs-causation errors in real life?
Five questions to ask before believing any causal claim
You're not going to run an RCT next time you read a news headline. But you can ask these five questions, and they'll filter out most weak causal claims:
Is there a plausible confounder?
What third variable could be driving both the cause and the effect? If you can think of one easily, the claim probably hasn't accounted for it.
Could the arrow point the other way?
Try the reverse causation story. Does it also fit the evidence? If yes, you can't tell direction from this data alone.
Is the effect size suspiciously large?
Tiny interventions rarely produce huge outcomes. If a single weekly habit allegedly doubles your income, lifespan, or happiness, the claim is almost certainly oversold.
Where does the data come from?
Self-reports, social media, online surveys, observational studies - all are vulnerable to selection bias. Random samples and intervention studies are far more trustworthy.
Has it been replicated?
A single study, no matter how dramatic, is rarely conclusive. Wait for replication, especially in psychology, nutrition, and social science where reproducibility rates are low.
Frequently Asked Questions
Q01Can correlation ever prove causation?
Q02If correlation isn't causation, why do scientists use observational studies at all?
Q03How strong does a correlation need to be before it's worth taking seriously?
Q04What's the difference between confounding and Simpson's paradox?
Q05How does this apply to investing?
Q06Are A/B tests the same as randomised controlled trials?
The Probabilistic Bottom Line
Default scepticism, calibrated belief
Probabilistic thinkers don't reject every correlation - they treat correlations as evidence with a weight that depends on the design behind them. A randomised trial moves the needle a lot. A natural experiment moves it some. A messy observational study with obvious confounders barely moves it at all.
Most causal claims you'll meet - in the news, on social media, in management books, in folk wisdom - are based on the weakest type of evidence. That doesn't make them all wrong. It does mean you should hold them loosely. Be willing to update if better evidence arrives, and be willing to discard them if the underlying study was a fishing expedition.
The goal isn't to become a sceptic of everything. It's to allocate your belief in proportion to the actual strength of the evidence - which is exactly what probabilistic thinking is for.