*Image: MiG 19. Public Domain.*

The base-rate fallacy happens when **available statistical data is ignored in favor of specific data to make a probability judgment**.

The C.I.A. gives this example to illustrate the problem:

During the Vietnam War, a fighter plane made a non-fatal strafing attack on a US aerial reconnaissance mission at twilight. Both Cambodian and Vietnamese jets operate in the area. You know the following facts:

(a) Specific case information: The US pilot identified the fighter as Cambodian. The pilot’s aircraft recognition capabilities were tested under appropriate visibility and flight conditions. When presented with a sample of fighters (half with Vietnamese markings and half with Cambodian) the pilot made correct identifications 80 percent of the time and erred 20 percent of the time.

(b) Base rate data: 85 percent of the jet fighters in that area are Vietnamese; 15 percent are Cambodian.

Question: What is the probability that the fighter was Cambodian rather than Vietnamese?

A common procedure in answering this question is to reason as follows: We know the pilot identified the aircraft as Cambodian. We also know the pilot’s identifications are correct 80 percent of the time; therefore, there is an 80 percent probability the fighter was Cambodian. This reasoning appears plausible but is incorrect. It ignores the base rate–that 85 percent of the fighters in that area are Vietnamese. The base rate, or prior probability, is what you can say about any hostile fighter in that area before you learn anything about the specific sighting.

The correct way to do this is to use Bayesian reasoning:

If we suppose that there are 100 enemy fighter planes total, that means that 85 are Vietnamese and 15 are Cambodian.

From paragraph (a), we know that the eye-witness identifies correctly enemy planes 80% of the time, so out of 85 Vietnamese planes, he would identify 68 correctly (85 * 0.80 = 68) and erroneously identify 17 (85 * 0.20 = 17).

Out of the 15 Cambodian aircrafts, he would identify correctly 12 of them (15 * 0.80 = 12) and be mistaken about 3 (15 * 0.20 = 3).

This makes a total of 71 Vietnamese and 29 Cambodian sightings, of which only 12 of the 29 Cambodian sightings are correct; the other 17 are incorrect sightings of Vietnamese aircraft.

Therefore, when the pilot claims the attack was by a Cambodian fighter, the probability that the craft was actually Cambodian is only 12/29ths or 41 percent, despite the fact that the pilot’s identifications are correct 80 percent of the time.

Ignore the base-rate in favor of specific data at your own risks! In some cases, it can make a huge difference.

*Normal (left) versus cancerous (right) mammography image. Public Domain image.*

Another example to make this kind of reasoning clearer:

1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammographies. 9.6% of women without breast cancer will also get positive mammographies. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?

So, based on the numbers above, what is the probability that a woman who gets a *positive* mammography really has breast cancer?

Lets go through it. If there are 10,000 women screened, 1% will have breast cancer. So that’s 100. 80% of those will get a positive result, so that’s 80. That leaves us with 9,900 women who don’t have breast cancer. Out of those, 9.6% will get a false-positive result, so that’s 950 women.

You see where this is going?

So out of 10,000 women who get tested, 80 will have a real positive result and 950 will have a false positive, for a total of 1,030 positive results. Out of those, only 80 really have cancer, that’s 7.76% (80/1,030 * 100 = 7.76).

**So if, with these numbers, you were to get a positive result to your mammography test, that would still mean that you only had a 7.76% chance of really having breast cancer.**

Counter-intuitive, but true.

**Sources:**

- Base rate fallacy at Wikipedia
- Base-Rate Fallacy by the CIA
- An Intuitive Explanation of Bayesian Reasoning By Eliezer Yudkowsky

**See also:** Rationality Resources

November 25, 2007 at 4:32 am |

Mike,

I have two observations. In the first example, you don’t actually give the Bayesian probabilities for false alarms from the actual data, and assume that in each case it is 0.8. More likely, they will have differnent false alarm rates. Imagine for instance that all of the Migs were correctly identified and that 30% of the Cambodian a/c. This would bring your probability of a correct Cambodian sighting down to about 10%.

In the second example, the statistic that scares me is that of those who get a second mamogram (standard procedure when you have an initial positive as you show) about 90 women out of 10,000 will have TWO false positives, and then be put on the track for having to treat breast cancer until found otherwise. Furthermore, 7 of those with breast cancer will have a second false negative. You are thus left with a conundrum in either case after the first positive.

November 25, 2007 at 1:15 pm |

Hi Nu,

I’m using the data from the CIA’s example (see link at the end). I’m sure that in a real-life situation things would be more complex. Also, I think that what they imply is that both the Vietnamese and Cambodian use the same planes (Soviet-made), so the witness is tested on how well he recognizes the markings on the planes and not plane types.

I put a picture of a MiG 19 on top because it is one of the planes used in the war, but I didn’t do the research to see if both Cambodia and Vietnam used those. I know MiG 17 and 21 were more frequent, though.

Very good observation about the second example. Scary stuff, really.

September 13, 2009 at 9:51 pm |

[...] For a nice overview of empirical data on the base rate fallacy, see this article in BBS. For more blogospheric bloviation on base rates, see here, here, and here. [...]

May 9, 2010 at 11:25 pm |

Hi everyone. I’m new here. Just wanted to say hi