# Bayes’ Theorem: Some Intuitive Principles, Part I

Ideal rational thinking is Bayesian, meaning that it is based on Bayes’ theorem. Bayes’ theorem tells us how to update our confidence in a theory based on our experiences. The theorem itself has a mathematical form, and its mathematical form can be intimidating to people who aren’t comfortable with algebra. In my classes on rational thinking, I’ve tried to express the practical content of Bayes’ theorem in terms of three intuitive principles. This blog post is about the first principle, and I’ll follow up with two more posts on the remaining principles.

To explain these principles, the best thought experiment I’ve been able to find involves gaming dice. If you’ve ever played Dungeons and Dragons, you’re familiar with 4-sided and 20-sided dice. A 4-sided die is a tetrahedron, and a 20-sided die is a dodecahedron. Of course, the sides on a 4-sided die are numbered 1-4, and the sides on a 20-sided die are numbered 1-20.

Now, suppose that I have a 4-sided die and a 20-sided die, and I pick one at random, and roll the chosen die. I don’t reveal which die I rolled, but I tell you that I rolled a 3. Which die did I most likely roll? Is it more likely I rolled the 4-sided die or the 20-sided die?

This question asks you which of two theories is more likely to be true:

THEORY T1: I rolled the 4-sided die.

THEORY T2: I rolled the 20-sided die.

T2 is a possible explanation for the evidence, i.e., for rolling a 3. The theory T2 is more flexible in the sense that T2 is compatible with more possible outcomes (20 of them, in fact). T1 is less flexible in that it is compatible with only 4 outcomes. Yet, as most of you have already intuited, T1 is more likely. It’s more likely that I rolled the 4-sided die. How can we see this more explicitly?

Imagine that we’re going to play this dice game 200 times in a row. That is, I’ll be picking a die at random 200 times, each time rolling that my die, and reporting the outcome.

Because I’m choosing my die at random, in 200 plays, I will pick the 4-sided die 100 times and the 20-sided die 100 times. Of the 100 times that I choose the 4-sided die, I expect the resulting rolls to be evenly split across each side of the die. That means we’ll expect to see each face of the 4-sided die come up 25 times (100 rolls divided by 4 sides).

Of the 100 times that I choose the 20-sided die, I’ll see each face come up 5 times (100 rolls divided by 20 faces).

So, out of the 200 times we play the game, we shall expect 3 to be rolled  25 times on the 4-sided die and 5 times on the 20-sided die, for a total of 30 instances of rolling a 3 in 200 plays.

Thus, we expect that 25 out of every 30 times a 3 is rolled in our dice game, the rolled die will have been the 4-sided die. Consequently, given that the 3 was rolled, we infer that there is an 83.3% chance that it was the 4-sided die that was rolled.

If we express this as a fraction, we get the following formula:

(Expected number of rolls of 3 on the 4-sided die)
_____________________________________________________________________

(Expected number of rolls of 3 on the 4-sided die)
+ (Expected number of rolls of 3 on the 20-sided die)

Or, in terms of the probability of rolling a 3 in each theory, the formula looks like

P(roll a 3 | T1)
__________________________

P(roll a 3 | T1) + P(roll a 3 | T2)

=

25%
_________ = 83.33..%

25% + 5%

What have we learned from this thought experiment?

Principle I:
All things being equal, the theory which predicts the a higher likelihood for the observed data is the theory most likely to be true.

When we don’t know which outcomes are more likely in the theory, or if the theory says each outcome is equally likely (as with casting a die), we can state this differently:

Principle I:
All things being equal, the theory which predicts the most number of alternatives to what is observed is the theory that’s least likely to be true.

When we’re trying to answer the question “Which theory probably accounts for what we see?”, we’re often tricked (or we trick ourselves) into answering a different question: “Which theory can possibly account for what we see?”

When we’re tricked this way, we are deceived into thinking that an elegant theory is a theory that is compatible with any observation we might possibly make.

When we answer the real question, “Which theory probably accounts for what we see?”, we’re not asking which theory is compatible with the data, but which theory is most likely to result in the data we’ve actually observed.

What matters is probability, not merely possibility.

Next Up: Principle II – All things are not equal!

### One Comment

Comments are closed.