Correlation vs Causation: A Probabilistic Thinking Guide

Correlation vs Causation: A Probabilistic Thinking Guide

Correlation is not causation. It's the most-quoted line in statistics, and the most often misunderstood. Here's what it actually means, why your brain confuses the two, and how to think clearly about cause and effect.

Correlation vs Causation: A Probabilistic Thinking Guide

Why your brain confuses the two — and how to think clearly about cause and effect.

Correlation is when two things move together. Causation is when one of them is the reason the other moves. They look identical from the outside, and that's why almost every confidently held belief about cause and effect is wrong.

Ice cream sales and drowning deaths are tightly correlated. So are the number of pirates in the world and global temperature. So are nations that consume more chocolate and the number of Nobel laureates they produce. None of these is a causal relationship — but if you only had the data and a hunch, you'd swear they were.

This guide is about how to spot the difference. We'll cover spurious correlations and why they're so common, Simpson's paradox (where a trend reverses when you split the data), confounding variables, and the actual methods researchers use to establish causation. By the end, you'll have a working mental toolkit for separating “A and B happen together” from “A causes B.”

What Correlation Actually Means

A statistical relationship — nothing more

A correlation is a measurable, repeatable association between two variables. When one moves, the other tends to move with it (positive correlation) or against it (negative correlation). Statisticians quantify this with a correlation coefficient — usually labelled r — that ranges from -1 to +1.

  • r = +1: perfect positive correlation. As one variable rises, the other rises in lockstep.
  • r = 0: no linear relationship.
  • r = -1: perfect negative correlation. As one rises, the other falls in lockstep.

Real-world correlations almost never hit ±1. A coefficient of 0.7 is considered very strong; 0.4 is moderate; 0.1 is barely there. But here's the catch: even a perfect correlation tells you absolutely nothing about why the variables move together. Correlation is descriptive. It says “this pattern exists.” It does not say what causes the pattern.

Two variables can correlate strongly because:

  1. One actually causes the other (real causation)
  2. The relationship runs the other way around (reverse causation)
  3. Both are caused by a third, hidden variable (confounding)
  4. The pattern is a coincidence in your particular sample (spurious correlation)
  5. You're looking at a non-random subset of the data (selection effects)

Of those five, only the first is genuine causation. The other four account for the overwhelming majority of correlations you'll encounter in life, science, and journalism.

Spurious Correlations: When Coincidence Looks Like Truth

Patterns that exist only because the universe is large

Tyler Vigen's Spurious Correlations project compiled hundreds of these. A few favourites:

  • The number of films Nicolas Cage appeared in correlates 0.66 with the number of people who drowned by falling into a pool.
  • US per-capita cheese consumption correlates 0.95 with the number of people who died by becoming tangled in their bedsheets.
  • The divorce rate in Maine correlates 0.99 with per-capita margarine consumption.

None of these are causal. They're statistical noise — patterns that emerge when you trawl enough variables looking for matches. With sufficient data, you can find a strong correlation between almost any two unrelated time series, especially if both happen to be trending in the same direction over the period you measure.

This is the first lesson of correlation vs causation: the existence of a correlation, on its own, is not strong evidence of anything causal. It's a starting hypothesis at best.

Confounding Variables: The Hidden Third Cause

When something else is driving both

Ice cream sales and drowning deaths are correlated — both go up in the summer. A naive analysis might suggest that ice cream causes drowning, or that drowning causes ice cream sales. Obviously neither is true. The real cause is a third variable: hot weather. Hot days drive both ice cream consumption and the likelihood of swimming in the first place.

This third variable is called a confounder. It causes both observed variables, creating a correlation between them that has nothing to do with one causing the other.

Confounders are everywhere in observational data:

Coffee drinkers live longer than non-drinkers

Confounder: people who can afford and tolerate coffee tend to be healthier overall.

Children with bigger feet have better vocabularies

Confounder: age. Older kids have bigger feet and bigger vocabularies.

Universities with smaller class sizes produce better grades

Confounder: smaller classes are often at wealthier institutions with better-prepared students.

Vitamin supplement users are healthier than non-users

Confounder: people who buy supplements are also more likely to exercise, sleep well, and avoid smoking.

The presence of a plausible confounder doesn't disprove a causal claim — but it forces you to control for it before taking the correlation seriously. Good observational research lists every plausible confounder and adjusts for it statistically. Bad research ignores them and reports the headline number.

Reverse Causation: When the Arrow Points the Other Way

B causes A, not A causes B

Sometimes two things really are causally linked, but the direction is the opposite of what intuition suggests.

People who use stand-up desks report better focus. Does the desk cause better focus? Possibly. But it's at least as plausible that people who already have good focus and discipline are more likely to buy stand-up desks in the first place. Without a controlled experiment, you genuinely can't tell which way the arrow goes.

Other classic reverse-causation traps:

  • People who attend therapy report worse mental health than people who don't. Does therapy cause poor mental health, or do struggling people seek therapy?
  • Hospitals have higher death rates than gyms. Do hospitals kill people, or do dying people end up in hospitals?
  • Successful CEOs tend to take cold showers. Does cold showering build success, or do successful people buy into wellness fads?

Reverse causation is particularly sneaky because it produces real correlations and feels like a causal mechanism. The only reliable defence is to ask: “If I imagine the arrow going the other way, does that story also fit the evidence?” If yes, you don't have a causal claim.

Simpson's Paradox: When the Data Lies in Both Directions

A pattern that reverses when you split the data

Simpson's paradox is one of the strangest things in statistics. It's a situation where a trend appears in several groups of data but reverses when the groups are combined — or vice versa.

The classic example is from a 1973 admissions audit at UC Berkeley. Looking at overall admissions, men were admitted at a noticeably higher rate than women, suggesting bias against women. But when researchers split the data by department, the pattern flipped: most departments were actually slightly biased in favour of women. The aggregate looked discriminatory because women happened to apply more often to highly competitive departments with low acceptance rates for everyone.

The same paradox appears repeatedly:

Treatment A looks better than Treatment B overall

But Treatment B is better in every patient subgroup. The difference comes from how patients were assigned.

A baseball player has a higher batting average each year than another

But over a career, the second player has the higher average. The mix of years explains the reversal.

A school's average test score drops year-on-year

But every demographic subgroup is improving. The composition of the student body changed.

Simpson's paradox is the strongest argument for never trusting an aggregate statistic without seeing the breakdown. The lesson is simple: a correlation that holds at one level can vanish or invert at another. Always ask, “What groups are inside this average, and what happens when I look at them separately?”

How Researchers Actually Establish Causation

Beyond correlation — the gold standards

Establishing genuine causation is hard. It typically requires one of these approaches, in roughly increasing order of reliability:

1

Bradford Hill criteria

A checklist for evaluating observational evidence: strength of association, consistency across studies, specificity, temporal sequence (cause precedes effect), dose-response relationship, biological plausibility. Used in epidemiology when experiments aren't possible (smoking and lung cancer was established this way).

2

Natural experiments

Situations where some external factor (a policy change, a natural disaster, a lottery) randomly assigns people to different conditions. Researchers can then study the consequences as if it were a controlled experiment, even though no one designed it.

3

Instrumental variables

A statistical technique that uses a third variable correlated with the cause but not directly with the outcome to isolate the causal effect. Common in economics — for example, using rainfall to study how farm income affects local school attendance.

4

Regression discontinuity

When eligibility for an intervention depends on crossing a threshold (a test score, an income cutoff), comparing people just above and just below the line approximates a randomised experiment.

5

Randomised controlled trials (RCTs)

The gold standard. Randomly assign people to treatment or control, measure the outcome. Random assignment eliminates confounders by construction. Used in medicine, increasingly in policy and tech (A/B testing is essentially this).

If you're reading a study that claims X causes Y and none of these methods is described — just a correlation in observational data — be sceptical. The study may still be useful as a hypothesis generator, but it almost certainly hasn't proved causation.

A Mental Checklist for Real Life

Five questions to ask before believing any causal claim

You're not going to run an RCT next time you read a news headline. But you can ask these five questions, and they'll filter out most weak causal claims:

1

Is there a plausible confounder?

What third variable could be driving both the cause and the effect? If you can think of one easily, the claim probably hasn't accounted for it.

2

Could the arrow point the other way?

Try the reverse causation story. Does it also fit the evidence? If yes, you can't tell direction from this data alone.

3

Is the effect size suspiciously large?

Tiny interventions rarely produce huge outcomes. If a single weekly habit allegedly doubles your income, lifespan, or happiness, the claim is almost certainly oversold.

4

Where does the data come from?

Self-reports, social media, online surveys, observational studies — all are vulnerable to selection bias. Random samples and intervention studies are far more trustworthy.

5

Has it been replicated?

A single study, no matter how dramatic, is rarely conclusive. Wait for replication, especially in psychology, nutrition, and social science where reproducibility rates are low.

Frequently Asked Questions

Can correlation ever prove causation?
On its own, no. A correlation can be perfectly consistent with a causal relationship, but it's also consistent with confounding, reverse causation, coincidence, and selection effects. Causation requires either a controlled experiment or a careful argument that rules out the alternatives.
If correlation isn't causation, why do scientists use observational studies at all?
Because experiments are often impossible — you can't randomly assign people to smoke for 30 years or live in different countries. Observational studies are how we generate hypotheses and gather circumstantial evidence. The Bradford Hill criteria exist precisely to evaluate this kind of evidence systematically when randomised trials aren't an option.
How strong does a correlation need to be before it's worth taking seriously?
Strength matters less than mechanism. A correlation of 0.3 with a clear causal pathway and replication across studies is more meaningful than 0.9 with no plausible mechanism. That said, weak correlations (under ~0.2) in noisy real-world data are often just statistical artefacts.
What's the difference between confounding and Simpson's paradox?
Confounding is a hidden cause of both variables that creates a misleading correlation. Simpson's paradox is what happens when a confounder is so strong that it actually reverses the apparent direction of an effect when you slice the data differently. Simpson's is a special, dramatic case of confounding.
How does this apply to investing?
Strongly. Most market &ldquo;edges&rdquo; that look like causation in backtests are spurious correlations from a small sample, confounded by overall market direction, or the result of survivorship bias. See our <a href="/blog/expected-value-explained">expected value</a> and <a href="/blog/thinking-in-probabilities">probabilistic thinking</a> guides for how to evaluate investment claims more rigorously.
Are A/B tests the same as randomised controlled trials?
Yes, in the statistical sense. A well-designed A/B test randomly assigns users to variants and measures the outcome — exactly the same logic as an RCT. They're the most reliable causal tool most product teams have. The catch is that they only work for short-term effects on the variables you measure; long-term and second-order effects are still observational.

The Probabilistic Bottom Line

Default scepticism, calibrated belief

Probabilistic thinkers don't reject every correlation — they treat correlations as evidence with a weight that depends on the design behind them. A randomised trial moves the needle a lot. A natural experiment moves it some. A messy observational study with obvious confounders barely moves it at all.

Most causal claims you'll meet — in the news, on social media, in management books, in folk wisdom — are based on the weakest type of evidence. That doesn't make them all wrong. It does mean you should hold them loosely. Be willing to update if better evidence arrives, and be willing to discard them if the underlying study was a fishing expedition.

The goal isn't to become a sceptic of everything. It's to allocate your belief in proportion to the actual strength of the evidence — which is exactly what probabilistic thinking is for.

Continue the Series

New to probabilistic thinking?

Start with our foundational series on expected value, base rates, and decision-making under uncertainty.

Start with Expected Value