Bayes' Theorem

John Melesky

2007-05-10

What is Bayes' Theorem?

It's a method for determining conditional probabilities.

Duh.

See? Wikipedia says so.

Um, dude, whut?

Seriously, i don't know, either.

An example

1000 Python programmers walk into a conference.

An example

1000 Python programmers walk into a conference.

10 of them (1%) harbour a dark secret.

A test!

"Do you prefer list comprehensions to 'map'?"

A test!

"Do you prefer list comprehensions to 'map'?"

How accurate is it?

If you harbour the dark secret, it will return positive 90% of the time
If you don't harbour that secret, it will return positive only 10% of the time

A question

Joe takes the test, and it returns positive.

A question

Joe takes the test, and it returns positive.

How likely is he to harbour this dark secret?

The answer

90% !

Duh.

The answer

Probably not what you think.

The answer: Bayes' Theorem

Bayes' Theorem tells us how to compute this.

The answer: Bayes' Theorem

Bayes' Theorem tells us how to compute this.

But we don't know Bayes' Theorem yet, so let's puzzle through it.

The question

Joe takes the test, and it returns positive.

How likely is he to harbour this dark secret?

The question, step-by-step

If we test everyone, how many people will get positive results?

The question, step-by-step

If we test everyone, how many people will get positive results?

And of those, how many will actually be evil?

The question, step-by-step

If we test everyone, how many people will get positive results?

And of those, how many will actually be evil?

We can use those two numbers to figure out how likely Joe is to be evil.

The population

1000 Python programmers

The population

1000 Python programmers

10% harbour the secret

The population

1000 Python programmers

990 Pure and stout of heart
10 Dark-hearted functional bastards

The bad population

10 Dark souls

The bad population

10 Dark souls

... with 90% positive test results ...

The bad population

10 Dark souls

... with 90% positive test results ...

... means that if we administer the test to everyone, we'll identify 9 functional bastards.

The bad population

10 Dark souls

... with 90% positive test results ...

... means that if we administer the test to everyone, we'll identify 9 functional bastards.

... oh, and 1 will get through.

The good population

990 Stout hearts

The good population

990 Stout hearts

... with a mere 10% positive test results ...

The good population

990 Stout hearts

... with a mere 10% positive test results ...

... means that if we administer the test to everyone, we'll clear 891 pure pythonic souls.

The good population

990 Stout hearts

... with a mere 10% positive test results ...

... means that if we administer the test to everyone, we'll clear 891 pure pythonic souls.

... BUT 99 will be falsely misrepresented!

The total population

1000 Python programmers

10 evildoers
- 9 rightly identified
- 1 slips through the cracks
990 true and stout-hearted pythonists
- 99 falsely accused
- 891 cleared by the test

The positive-test population

108 people will test positive (99 + 9)

The positive-test population

108 people will test positive (99 + 9)

9 of which are actual wrong-headed fools.

The positive-test population

108 people will test positive (99 + 9)

9 of which are actual wrong-headed fools.

Which means that 8.33% (9 / 108) of those with a positive test result will be nasty, worthless functional programmers.

The positive-test population

108 people will test positive (99 + 9)

9 of which are actual wrong-headed fools.

Which means that 8.33% (9 / 108) of those with a positive test result will be nasty, worthless functional programmers.

Which is a far cry from 90%

The total population

1000 Python programmers

1% incidence of dark secret, 90% positive test rate for those who harbour dark secret, 10% positive test rate for those who are innocent.

1% - 10 functional programmers
- .9% - 9 rightly identified
- .1% - 1 slips through the cracks
99% - 990 true pythonists
- 9.9% - 99 falsely accused
- 89.1% - 891 cleared by the test

The total population

1000 Python programmers

1% incidence of dark secret, 95% positive test rate for those who harbour dark secret, 10% positive test rate for those who are innocent.

1% - 10 functional programmers
- .95% - 9.5 rightly identified
- .05% - 0.5 slips through the cracks
99% - 990 true pythonists
- 9.9% - 99 falsely accused
- 89.1% - 891 cleared by the test

The total population

1000 Python programmers

1% incidence of dark secret, 95% positive test rate for those who harbour dark secret, 2% positive test rate for those who are innocent.

1% - 10 functional programmers
- .95% - 9.5 rightly identified
- .05% - 0.5 slips through the cracks
99% - 990 true pythonists
- 1.98% - 19.8 falsely accused
- 97.02% - 970.2 cleared by the test

Bayes' Theorem

probability(guilt, given a positive test result) =

probability(positive test result, given guilt) * probability(guilt)

over

probability(positive test result)

Bayes' Theorem

probability(guilt, given a positive test result) =

probability(positive test result, given guilt) * 1%

over

probability(positive test result)

Bayes' Theorem

probability(guilt, given a positive test result) =

90% * 1%

over

probability(positive test result)

Bayes' Theorem

probability(guilt, given a positive test result) =

90% * 1%

over

hmmm....

probability(positive test result)

population with positive test result

over

total population

probability(positive test result)

population of true positives + population of false positives

over

total population

probability(positive test result)

9 + population of false positives

over

total population

probability(positive test result)

9 + 99

over

total population

probability(positive test result)

9 + 99

over

1000

probability(positive test result)

(9 + 108) / 1000 = 10.8%

Bayes' Theorem

probability(guilt, given a positive test result) =

90% * 1%

over

10.8%

Bayes' Theorem

probability(guilt, given a positive test result) =

90% * 1% / 10.8% =

8.33%

Bayes' Theorem

probability(guilt, given a positive test result) =

90% * 1% / 10.8% =

8.33%

And it's just that simple

But....

What's this got to do with spam?