Conditional Probability Explained Simply, With Examples

Conditional probability is the chance that one event happens given that another event has already happened. Written P(A|B) and read "the probability of A given B", it's the single most useful idea in probability for everyday decision-making — and the source of nearly every famous probability puzzle that looks impossible at first glance. This guide builds it up from scratch with worked examples, then shows how it underpins Bayes' theorem, the Monty Hall problem, medical-test interpretation, and weather forecasting.

What conditional probability actually means

Re-asking the question once you know more

The phrase conditional probability sounds intimidating but the idea is everyday. Imagine you're told a card has been drawn from a standard 52-card deck. Without any other information, the chance it's the Ace of Spades is 1 in 52. Now suppose someone tells you the card is black. The relevant universe just shrunk from 52 cards to 26, and the chance it's the Ace of Spades is now 1 in 26. The probability changed because new information narrowed what's possible.

That's conditional probability in one sentence: it's the probability of an event in a smaller world, where the smaller world is defined by something you've just learned. Mathematically, P(A|B) = P(A and B) / P(B). In words: out of all the times B happens, how often does A happen too? The denominator throws away every outcome where B didn't happen, leaving you with a fresh, smaller sample space to ask the question in.

Conditional probability is what your brain does informally all day. "How likely am I to be late given I leave at 8.30am?" is conditional. "How likely is it to rain given there are dark clouds?" is conditional. The formal version just lets you assign numbers and combine them without contradicting yourself.

Visualising it with a Venn diagram

Two circles, one shared region — the picture that makes the formula obvious

Draw a rectangle (the whole sample space) and two overlapping circles inside it called A and B. The full rectangle has probability 1. Each circle has its own probability — P(A) and P(B) — and the lens-shaped overlap in the middle is P(A and B), the probability that both happen.

To compute P(A|B), mentally erase everything outside circle B. Circle B is now your entire universe — its probability is rescaled to 1. The lens-shaped overlap, which used to be P(A and B) within the rectangle, now occupies a fraction of circle B. That fraction is exactly P(A and B) / P(B). That's the formula, derived purely by squinting at a diagram.

This is why the formula has a division in it, and why the denominator is P(B) rather than P(A). We're conditioning on B, so B becomes the new total. Swap to P(B|A) and the denominator becomes P(A) instead — same overlap, different rescaling. The two conditional probabilities are usually not equal, and confusing them is the root cause of most probability mistakes (we'll see this in the medical testing section).

Worked example 1: a tree diagram

Two coin flips and a question about what we know

You flip a fair coin twice. Someone tells you at least one of the flips came up heads. What's the probability that both came up heads?

Most people answer one-half, because the unknown flip is fifty-fifty. The right answer is one-third. Here's why.

Draw a tree. The first flip branches into Heads and Tails. Each of those branches into Heads and Tails again. You get four equally likely outcomes: HH, HT, TH, TT — each with probability 1/4.

The information at least one heads is event B. It's true for HH, HT, and TH, but not for TT. So P(B) = 3/4. The event both heads is event A, true only for HH, so P(A and B) = P(HH) = 1/4. The conditional probability is:

P(both heads | at least one heads) = P(A and B) / P(B) = (1/4) / (3/4) = 1/3.

The intuition that trips people up is the assumption that one specific flip is now "locked in" as heads. But all you actually know is that at least one of the two flips was heads. Three of the four equally likely scenarios (HH, HT, TH) are still on the table, and only one of them is HH. Conditioning correctly means restricting to the right subset of outcomes — not arbitrarily fixing a coin's value.

Worked example 2: medical testing

The base-rate trap that catches doctors as often as patients

Suppose a disease affects 1 in 1,000 people in the general population. A test for the disease is 99% accurate, in the sense that it correctly flags 99% of true positives and correctly clears 99% of true negatives. You take the test and it comes back positive. What's the probability you actually have the disease?

The intuitive answer is 99%. The actual answer is closer to 9%.

To see why, imagine running the test on 100,000 people. About 100 of them genuinely have the disease (the 1-in-1,000 base rate). The test correctly catches 99 of those — true positives. The remaining 99,900 don't have the disease, but the test wrongly flags 1% of them — that's 999 false positives. So out of 99 + 999 = 1,098 total positive results, only 99 are real. Conditional on a positive test, the chance you have the disease is 99 / 1,098 ≈ 9%.

This is conditional probability done correctly. The 99% accuracy figure is P(positive test | disease). The number you actually want is P(disease | positive test). They're not the same number — and when the underlying disease is rare, they can differ by an order of magnitude. This is a real problem in medicine. Studies have repeatedly shown that even experienced doctors often invert these probabilities, advising patients much more pessimistically than the data warrants.

The technical name for this confusion is the base-rate fallacy. The fix is mechanical: always ask whether the headline number is the chance of the symptom given the condition, or the chance of the condition given the symptom. They are conditional probabilities running in opposite directions, and switching between them is exactly what Bayes' theorem is for.

Bayes' theorem: switching the direction

The single equation that lets you flip a conditional probability around

Bayes' theorem is just a rearrangement of the conditional probability formula. Starting from P(A|B) = P(A and B) / P(B) and P(B|A) = P(A and B) / P(A), you can solve both for P(A and B), set them equal, and rearrange to:

P(A|B) = P(B|A) × P(A) / P(B)

That's the whole theorem. In the medical-testing example, A is having the disease and B is testing positive. We knew P(B|A) = 0.99 (the test's sensitivity), P(A) = 0.001 (the base rate), and we can compute P(B) ≈ 0.011 (the share of the population who'd test positive at all). Plugging in: P(A|B) = 0.99 × 0.001 / 0.011 ≈ 0.09. Same 9% answer, derived directly.

The reason Bayes' theorem matters is that the world rarely hands you the conditional probability you want. It hands you the one running the other way. A weather forecast tells you P(this morning's clouds | rain later) implicitly through years of meteorology data, but you want P(rain later | this morning's clouds). A spam filter knows P(this word appears | spam), but you want P(spam | this word appears). Bayes' theorem is the bridge.

For a deeper walk through Bayesian reasoning in everyday decisions — including how to update your beliefs as evidence accumulates — see our guide to Bayesian thinking in everyday decisions.

The Monty Hall problem

The famous game-show puzzle that turns on conditional probability

You're on a game show. Three doors hide a car and two goats. You pick door 1. The host, who knows where the car is, opens door 3 to reveal a goat. He then offers you the chance to switch to door 2. Should you?

Most people say it doesn't matter — two doors left, one car, fifty-fifty. The right answer is to switch. Switching wins the car two-thirds of the time; staying wins one-third.

The correct way to see this is via conditional probability. Before the host opens any door, the probability the car is behind your chosen door 1 is 1/3, and the probability it's behind one of the other two doors (door 2 or door 3) is 2/3. The host's reveal of door 3 doesn't change either of those facts about your initial choice — it just tells you which of the other two doors definitely doesn't have the car. The 2/3 probability that was distributed across doors 2 and 3 is now concentrated entirely behind door 2.

The reason this trips people up is that they treat the host's action as if it were random, when it isn't. The host knows where the car is and never opens the door with the car. That structural asymmetry is the source of the conditional information. If the host opened doors at random and happened to reveal a goat, switching really would be a fifty-fifty proposition. The puzzle hinges entirely on what you condition on — the host's deliberate, informed reveal — not on the bare fact that one door has been eliminated.

Worked example 3: weather forecasting

What "30% chance of rain" actually means in conditional terms

A 30% chance of rain in tomorrow's forecast is a conditional probability statement. Specifically, it's P(rain in your area tomorrow | the atmospheric conditions the forecaster has observed today). The conditioning is invisible because we don't usually write it out, but every weather forecast you've ever read is implicitly conditional on whatever the forecaster currently knows.

This matters when the conditioning changes. "30% chance of rain tomorrow" said on Monday morning is a different number from "30% chance of rain tomorrow" said on Tuesday morning. Both reference the same underlying weather, but they're conditioning on different information sets — Tuesday's forecast incorporates everything that happened in the intervening day. Forecast probabilities aren't fixed properties of the future; they're updated estimates that move as evidence comes in. That's exactly the Bayesian view of probability we touched on above.

It also explains why aggregate forecast accuracy is judged by calibration, not by individual prediction outcomes. A forecaster who says 30% on a hundred days and sees rain on roughly 30 of them is well-calibrated — even though they were "wrong" 70 times. Conditioning on the same 30%-rain prediction repeatedly tells you what 30% really means in their hands. We've covered the calibration concept in more depth in the guide to probability calibration training.

Independence: when conditioning changes nothing

The special case that simplifies (and trips up) probability calculations

Two events are independent if knowing one happened tells you nothing about the other. Formally, A and B are independent when P(A|B) = P(A). The conditional probability collapses to the unconditional one — knowing B doesn't move the needle on A.

Coin flips are the classic example. The first flip and the second flip are independent: knowing the first came up heads tells you nothing about the second. So P(second is heads | first is heads) = P(second is heads) = 1/2. This is also why the gambler's fallacy is wrong — a roulette wheel that has just landed on red ten times in a row still has the same probability of red on the next spin, because consecutive spins are independent.

Lots of real-world events aren't independent, and treating them as if they were is one of the most expensive mistakes in probability. House prices in two neighbouring cities aren't independent — they share macro drivers. Test scores from twins aren't independent — they share genes and environment. Stock returns across companies in the same sector aren't independent — they share industry conditions. Conditional probability is what you reach for when independence breaks down.

Common mistakes when working with conditional probability

The four traps that catch almost everyone, including statisticians

1. Inverting the direction. Confusing P(A|B) with P(B|A) is the single most common conditional-probability mistake. It's the medical-testing trap, the prosecutor's fallacy, and a hundred everyday misreadings of statistics. Always check which way round the conditioning runs in any number you're quoted.

2. Ignoring the base rate. Even when the conditional direction is correct, the answer changes drastically with the underlying frequency. A 99%-accurate test means very different things for a 1-in-1,000 disease and a 1-in-3 disease. We've written a deeper piece on this trap in base-rate neglect.

3. Assuming independence that isn't there. Multiplying probabilities together — P(A and B) = P(A) × P(B) — only works when A and B are genuinely independent. When they aren't, the right formula is P(A and B) = P(A) × P(B|A), which can be much smaller or much larger than the naive product depending on the correlation.

4. Conditioning on the wrong thing. Sometimes the conditioning event you're given isn't the one you should be using. The Monty Hall problem is the textbook case: people condition on "door 3 is open" when the correct conditioning event is "the host, knowing where the car is, deliberately opened a door with no car behind it". Different conditioning, different answer.

Frequently asked questions

What's the difference between conditional probability and joint probability?

Joint probability is P(A and B), the chance both events happen at all. Conditional probability is P(A|B), the chance A happens given that B has already happened. The relationship is P(A|B) = P(A and B) / P(B). Joint probability looks at the overlap; conditional probability rescales the overlap into the world where B is known to have occurred.

How is conditional probability related to Bayes' theorem?

Bayes' theorem is a rearrangement of the conditional probability formula that lets you flip the direction of conditioning. If you know P(B|A) but want P(A|B), Bayes gives you P(A|B) = P(B|A) × P(A) / P(B). That's all it is. The reason it's so famous is that real-world data usually arrives in the wrong direction — you have the test's accuracy but want the patient's prognosis — and Bayes is the bridge.

Are conditional probability and dependent events the same thing?

Not quite. Two events are dependent when knowing one happened changes the probability of the other — i.e. when P(A|B) ≠ P(A). Conditional probability is the language we use to express that dependence with numbers. Independent events still have conditional probabilities, but they collapse to the unconditional ones.

When should I use conditional probability instead of joint probability?

Use joint probability when you're asking how often both events occur in absolute terms over the whole sample space. Use conditional probability when you've already learned that one event has occurred and want to know the chance of the other. Most real-world questions — medical tests, weather forecasts, fraud detection — are conditional, because you start with some information and want to update.

Why is the answer to the two-coin-flip problem 1/3 instead of 1/2?

Because the information given is "at least one of the two flips came up heads", which is true for three of the four equally likely outcomes (HH, HT, TH). Of those three, only one (HH) is two heads. So the conditional probability of two heads given at least one heads is 1/3, not 1/2. The 1/2 answer comes from imagining you've been told a specific coin (say the first) is heads, which is a different conditioning event.

What does P(A|B) = 0 mean?

It means A is impossible given B has happened — the two events are mutually exclusive in a strong sense. For example, P(card is red | card is the ace of spades) = 0, because the ace of spades is black. Note this is different from P(A) = 0; an event can have zero probability overall but non-zero probability conditional on something specific, and vice versa.

Conditional probability is the workhorse of careful thinking under uncertainty. Once you can read a probability statement and immediately ask "conditional on what?", you'll spot the base-rate trap, you'll switch the direction of Bayes' theorem in your head, and you'll find that the famous "impossible" puzzles — Monty Hall, the two-child problem, the Tuesday-boy problem — all reduce to the same mechanical question of which subset of outcomes you're really restricting to.