Last post on probability for now. This one doesn’t involve propositions or questions of ignorance vs. knowledge, just straightforward mathematics. The point is to illustrate how counterintuitive conditional probabilities can be.
Suppose in a given population there are an equal number of males and females. This population is subject to a disease, which I’ll call Bayesitis. 1 out of 10,000 males has Bayesitis, while 1 out of 12,500 females has it. There is a test for Bayesitis. If a person has the disease, the test returns a positive result 98% of the time. If a person does not have the disease, the test returns a negative result 99% of the time. The test results are indifferent to the sex of the person. UPDATE: It can be assumed that the probabilities work exactly; that is, the size of the population is a common multiple of 10,000 and 12,500, and exactly 1 out of 10,000 males and 1 out of 12,500 females has the disease.
A member of the population is selected at random and tested for Bayesitis. The result is positive.
1) What is the probability that the person has the disease?
2) What is the probability that the person is male?
Answers should be given to at least five decimal places.
A little clue:
Most doctors get this wrong, due to making the same mistake that the commentator on your last post made (one of the mistakes, that is), namely confusing the probability that A is true given B, with the probability that B is true given A.
Or confusing the probability that A is true given B, with the simple probability that A is true.
1) What is the probability that the person has the disease?
~.0087436
2) What is the probability that the person is male?
~.5004808
It is clear without even doing the exact calculations that the probability of the person having the disease must be rather low, since only 1 out of every 10000th person at the most has the disease, and the test showing positive can only make it about 100 times more probable; hence the probability cannot be above 1%.
You can also think about this problem by considering what the overall test results would be if everyone in the population was tested.
This also implies that the test is almost worthless if you do not have any other information to gauge whether you have the disease, since regardless of whether the result is negative or positive the probability is so small that you actually have the disease that in most cases it wouldn’t matter.
I got the first probability as ~ .0087427…, the ratio being 1112006 / 127191909.
For the second probability I got the same result as above.
Let’s say there are 20,000,000 people. Then there are 10,000,000 males and 10,000,000 females.
Males w/ Bayesitis (B) = 1,000
Females w/ B = 800
People w/ B = 1800
People w/ B who test positive = 1,764
People w/ B who test negative = 36
People w/o B = 19,998,200
People w/o B who test positive = 199,982
People w/o B who test negative = 19,798,218
Probability that someone who tests positive has the disease = number who have it and test positive / total number who test positive = 1764 / (199,982+1764) = 1764 / 201746 = ~0.0087436677
How did you get that ratio, Dominic?
Good question. I just reworked the problem, using the same method as before, but the result I got this time was 882 / 100873 (i.e, 1764 / 201746). I was working with a CAS so I must have mistyped a number and not noticed.
The formula I used was (1/2)*((1/10000 + 1/12500)*(98/100)) / ((1/2)*((1/10000 + 1/12500)*(98/100) + (9999/1000000 + 12499/1250000))).
Regarding Joseph’s comment, see this page for citation information for one of the original studies as well as many others.
I assumed a population of 100000, equal male and female proportions. I got the probability of the person having the disease as 9/(((.98 x 5) + (.01 x 49995)) +((.98 x 4) + (.01 x 49996))) = .00892
What did I do wrong?
I deleted my first response because I was assuming 100,000 each of males and females.
The mistake is that you divided the number having the disease by the number who test positive. You need to divide the number who have the disease and test positive by the total number who test positive. This is because it is given that the result was positive, so you need to exclude the small fraction who have the disease but test negative anyway.