Most of us who’ve studied probability theory at University level will have learned that it is formalised using the Kolmogorov axioms. However, there is an interesting alternative way to approach the formalisation of probability theory, due to R. T. Cox. You can get a quick overview from this Wikipedia page, although it doesn’t really motivate it very well, so if you’re interested you’re much better off downloading the first couple of chapters of Probability Theory: The Logic of Science by Edwin Jaynes, which is an excellent book (although sadly an incomplete one, because Jaynes died before he could write the second volume) and should be read by all scientists, preferably while they’re still impressionable undergraduates.
For Cox, probability theory is nothing less than the extension of logic to deal with uncertainty. Probabilities, in Cox’s approach, apply not to “events” but to statements of propositional logic. to say p(A)=1 is the same as saying “A is true”, and saying p(A)=0.5 means “I really have no idea whether A is true or not”. A conditional probability p(A|B) can be thought of as the extent to which B implies A.
There are a couple of interesting differences between Cox’s probabilities and Kolmogorov’s. Cox’s is more general, but also less formal (people are still working on getting it properly axiomatised). One important difference is that in Cox’s approach a conditional probability p(A|B) can have a definite value even when p(B)=0 (this can’t happen in Kolmogorov’s formalisation because, for Kolmogorov, p(A|B) is defined as p(AB)/p(B)). This means that, unlike the logical statement , the probabilistic statement p(A|B)=1 doesn’t mean that A is true if B is false. So conditional probabilities are like logical implications only better, since they don’t suffer from that little weirdness.
Anyway, that’s cool but what I really wanted to write about was this: in Cox’s version of probability theory, it’s meaningful to talk about the probability of a probability. That is, you can write stuff like p(p(A|B)=1/2)=5/6 and have it make sense. I’ll get to an example of this in a bit.
First I have to explain an important formal detail of Cox’s approach. For Cox, all probabilities are conditional. This is because logic, by and large, isn’t about which statements are true and which are false, it’s about what we can infer by assuming things are true or false. It’s implications — or conditional probabilities — all the way down. When we write something like p(A)=0.6, we’re implicitly saying “the probability of A is 0.6, given everything I know about it.” We can write this as p(A|K), where K represents our “state of knowledge”, i.e. everything we know.
An example: imagine we’ve hidden a ball under one of three cups, and Bob doesn’t know where it is. Let the statement A be “the ball is under cup 1″ and the statement B be Bob’s state of knowledge. (This is to be thought of as a statement of propositional logic as well. It looks something like this: “my name is Bob AND the sky is blue AND all men are mortal AND … AND a ball is hidden under one of these three cups.”) If Bob has no information about which cup the ball is under then we can determine that p(A|B)=1/3.
Now let’s contrive a more complicated example. Let’s say we’ve flipped a fair coin and put it into a sealed box. We’ve peeked in the box and know the coin is showing tails, but Bob hasn’t, so he has no reason to believe the coin is showing either heads or tails and must assign an equal probability to each. If we let our state of knowledge be represented by C, Bob’s by B, and the statement “the coin is showing heads” by A, then p(A|C)=0 (we know that A is false) and p(A|B)=1/2 (Bob has no idea).
What happens next is this: we will leave the room, and Bob (who we trust) will roll a fair die. If he rolls a one then he’ll look in the box and see the coin, but otherwise he won’t do anything. After we’ve left the room and waited a little while, our state of knowledge is as follows: there is a one in six chance that the die came up 1. In that case Bob will know the coin is showing tails (i.e. A is false), but there is also a five in six chance that the die came up with some other number and Bob didn’t look in the box. In this case p(A|B) will still be 0.5. We can sum this up as follows:
p(p(A|B)=0 | C) = 1/6 (i.e. given our state of knowledge C, there’s a one in six chance that Bob knows A is false), and
p(p(A|B)=1/2 | C) = 5/6 (there’s a 5 in 6 chance that Bob still has no idea what the coin is showing).
This is the example I gave above. This type of thing isn’t without its practical applications, although to the best of my knowledge they are few. I just wanted to show you this because it’s interesting that mathematics allows us a way to express our uncertain reasoning about someone else’s uncertainty, in a precise manner.