## Measure Theoretic Probability for Dummies: Part I

Nothing makes me empathise more with those struggling with probability theory than reading things like this on Wikipedia:

Let (Ω, F, P) be a measure space with P(Ω)=1. Then (Ω, F, P) is a probability space, with sample space Ω, event space F and probability measure P.

This is written so that only the people who already know what it is saying can understand it. The only possible value of this sentence would be to someone who managed to study measure theory without being exposed to it’s most widespread application; in other words: no one! Whilst the attitude this, and soooo many Wikipedia pages displays encourages people to be precise in a way that mathematicians cherrish, it also alienates a lot of perfectly capable, intelligent people who just run out of patience in the face of the relentless influx of oblique statements.

Personally, I think that understanding probability spaces is very important, but for the reasons including those I mention above, most people find the measure theoretic formalisation daunting. Here I have tried to outline the most widely used formalisation, which has turned out to be far more work than I expected…

## Probability Spaces

A probability space is made of three components. You need all three to be specified or else much further down the line your calculations will encounter things that are ambiguous. You need:

1. A list of possible outcomes.
2. A list of questions that we can ask about those outcomes.
3. The probabilities that each of those questions is “yes”.

The less obvious thing on this list is the list of questions that you are allowed. (I will spend most of this post explaining this).

Anyway, as an example of how this works, the probability space describing a fair coin toss is made of:

1. Two possible mutually exclusive outcomes: heads and tails
2. Four possible questions:
1. is it heads?
2. is it tails?
3. is it either heads or tails?
4. is it neither heads nor tails?

There are plenty of logically valid statements that we exclude here, I will come back to this in the “Events” section.

3. With probabilities:
2. tails: 0.5
3. either:1
4. neither: 0

## Possible Outcomes

The list of possible outcomes, known as the sample space can be pretty much anything – but each outcome must be mutually exclusive. For the coin toss it is the set written {heads, tails}. If we were counting the number of heads found in 10 coin tosses, we would choose the numbers {0,1,2,3,4,5,6,8,9,10}. If we were measuring heights of people, a good choice would be the positive real numbers etc.

This set is also known as the support. It is often denoted with an uppercase greek omega: “Ω”.

## Events

The idea of events is probably the most difficult thing in this formalisation. Events are basically questions that are allowed to be asked about the outcomes. There are some rules to defining them, but in my opinion they are pretty reasonable.

The rules are:

1. If I am allowed to ask a question, I must also be allowed to ask the opposite.In the coin example I am allowed to ask if it is heads, so I must also be allowed to ask if it is not heads (i.e. tails).
2. If I am allowed to ask two different questions, A and B, I must be able to ask the question A OR B. In the coin example we are allowed to ask whether it is heads and we can ask whether it is tails, so we must be allowed to ask if it is heads or tails.
3. There are two questions that one must be allowed to ask:
1. We must be allowed to ask the question “is it none of the above”. This is needed for completeness. In the coin example, this is “neither heads nor tails”. Whilst I am allowed to ask it, the probability will always be zero (by definition).
2. We must be allowed to ask the question “is it any of the above”. In the coin example this is the same as “either heads or tails”. Although I can ask it the probability will always be one (by definition).
4. There is one more that I will add for completeness rather than necessity: If we can ask two questions, A and B, I must be able to ask the question A AND B. In the coin example we are allowed to ask if it is heads as well as ask whether it is tails. This means we must also be allowed to ask if it is heads and tails.
In this case, as heads and tails are mutually exclusive and asking “heads and tails” is the same as asking “is it none of the above”. As another example, consider the outcomes of rolling a dice. Lets say we allow asking the question “is it an odd number” and the question “is it less than 4″. If so, we must also be allowed to ask “is it an odd number that is less than 4″, which corresponds to the outcomes 1 and 3.

I consider these rules to be fair, if you can interpret “the cat has a black spot on its paw” in terms of probability, then you should also be able to interpret “the cat does not have a black spot on its paw” in terms of probability too. A system that breaks these rules would be very strange.

The power of these rules is that when they hold we have a “complete” list of questions: there is no combination of questions using NOT, AND, and OR, that is not equivalent to a question already on the list. It specifies a completeness and consistency in when probabilities are defined. [an interesting exercise to try later is to ask what happens to logical implication. Hint: p→q = ¬p∨q]

Once put into the formal framework all these rules get mathematical names: closure under complement, closure under countable union, containing the empty set, containing the support and closure under countable intersection, respectively.

Lets have a look at how you can represent the questions described by this set of rules. For the coin example we can easily write down the possible outcomes associated with each question.

 Question Corresponding Outcomes is it heads? {heads} is it tails? {tails} is it either heads or tails? {heads,tails} is it neither heads nor tails? {}

From here we can check all the rules that I mentioned. If I ask the questions “is it heads”, is there is a set of outcomes corresponding to its negation? yes: {tails}. Is there an “none of the above” question: yes, it corresponds to the empty set, {}. etc. etc. I’ll leave it to you to convince yourself that this is true.

If we gather all the outcomes corresponding to allowable questions we get something called an event space, or σ-algebra (from measure theory). It is often written with an upper case Greek sigma Σ, or with an F (for Event, but E is already used for something else). For the coin example:

If you want to go into the maths, all of the rules above can be written in terms of the set Σ, using standard set theory operations.

1. If E is in Σ, then its complement (Ω \ E) is in Σ.
2. If E1 and E2 are in Σ, then their union (E1 ∪ E2) is in Σ.
1. The empty set, {}, is in Σ
2. The support, Ω, is in Σ
3. If E1 and E2 are in Σ, then their intersection (E1 ∩ E2) is in Σ.

## This event space stuff all sounds overly complicated, why would you bother?

The standard answer to this is to refer to continuity, which provides a very good reason indeed, but I shall leave this bit for part II.

Personally, I think the really nice thing about the event space formalization is how it gives a way of knowing which probabilities are implicitly needed to answer particular collections of probabilistic questions. I will give an example to demonstrate this. This example might be a bit harder to understand for those without an intuition for set theory, so those who are already putting in a lot of mental effort should probably skip to the next section…

### Example

Say I have a standard six sided dice. The possible outcomes (Ω) of a roll are {1,2,3,4,5,6} but lets say I only care about the probabilities of rolling an odd or even number. This raises a question: Is it OK to only consider the probabilities of odd and even without assigning the individual probabilities of each number? With the event space formalism, we can answer this.

What we can do is find the “smallest” set of questions that both fulfills the rules above and also let us ask about odd and even.

The set of odd outcomes is {1,3,5} and the set of even outcomes is {2,4,6}. So these give the set: {{1,3,5},{2,4,6}}. We can apply the rules above to this: if I can ask if it is odd, and I can ask if it is even, then I should be able to ask if it is (1) both even and odd (2) neither even nor odd. These correspond to the sets {} and {1,2,3,4,5,6}=Ω respectively. As these are not in our set, on it’s own {{1,3,5},{2,4,6}} does not make a complete, self-contained system of questions – in other words, it is not an event space. At least not how it stands.

To make it an event space, we just apply our rules and if they do not hold, add the element that makes it hold. For example, being able to ask the question “is it either odd or even” requires us to add {1,2,3,4,5,6} to our list. We keep doing this until the rules hold, quickly we find that:

Σ = {{},{1,3,5},{2,4,6},{1,2,3,4,5,6}}

So, the set of questions: “does the dice show neither an even nor an odd number?”, “does it show an odd number?”, “does it show an even number?”, “does it show either an odd or an even number?”, form a consistent system of questions. You can talk about the probability of a dice roll being odd or even without having to know about the probabilities of specific numbers.

There are definitely collections of questions about dice that do require you to know the individual probabilities, for example, knowing the individual probabilities is implicit in being able to assign probabilities to: {“does the dice show 6 or less?”, “does the dice show 5 or less”, “does the dice show 4 or less”, … “does the dice show 1″}.

## The Probability Measure

So far, I have described the support – the list of possible outcomes, and the event space – the list of possible questions. Now I need to describe the last component: the probability measure.

In a sense, this is the simplest bit. But there is one very very important thing to remember: the measure assigns a probability to an event (a question about the outcomes) NOT to the outcomes directly!!. Often there are events that correspond exactly each outcome, but it is not necessarily true.

The measure, often written with a lowercase Greek mu, μ, or with a P. All this is, is a function that assigns a probability to each question (from now on, I shall use the usually terminology of “event”). There are, of course, some constraints on this function, the first two are that the probability of “none of the above” (the event, E, is {}) is zero, and the probability of “any of the above” (E=Ω) is one. For the coin example we have:

 E μ(E) {} 0 {heads} 0.5 {tails} 0.5 {heads,tails} 1

The rules that μ needs to obey are pretty much obvious when you consider them in the context of probability. These rules are:

1. Probabilities must be positive numbers, μ(E) ≥ 0
2. Probabilities must be less than or equal to one, μ(E) ≤ 1
3. The probability of either of two mutually exclusive events must be the sum of the individual events (this is known as σ-additivity). For example, for the coin we have μ({heads}) + μ({tails}) = μ({heads, tails}), or in numbers: 0.5+0.5=1.

The probability space is just given the three things I have mentioned, the outcomes, the questions, and the probability of the questions. We write the space as the collection of these three things, in the present case: (Ω, Σ, μ).

That’s basically it, now all that is left is to tie it into some other ideas.

## Notations

It is good to look at how we express probabilities in this. Instead of “or” we write the union of events, for “and” we write the intersection and for “not”, we write the complement (we should remember that formally the mathematical objects in different notations are quite different).

 Word Statement Probability/Logic Notation Measure Notation Probability of A Pr(A) μ(A) Probability of not A Pr(¬A) μ(Ac) or μ(A\Ω) Probability of A or B Pr(A ∨ B) μ(A ∪ B) Probability of A and B Pr(A,B) or Pr(A ∧ B) μ(A ∩ B)

I think that will do for now… I will do conditional probabilities and continuous variables next time…

### 5 Responses to “Measure Theoretic Probability for Dummies: Part I”

1. Excellent descriptions of the bases here. I’m going to have to think more carefully about the way I introduce probability when next I teach it to a non-math-majors audience, since some of your plain-English explanations (particularly about needing to be able to ask about the opposite of the event occurring) I realize I wasn’t making clear.

2. Good! Really appreciate you taking everything from the ground up. When I study this sort of thing, not understanding the notation ({}, ∩) is sometimes more discouraging than the complexity of the problem itself.

One thing, you say ‘Often there are events that correspond exactly each outcome, but it is not necessarily true.’ Can you give an example?

Looking forward to an equally lucid breakdown of Bayes

• Was I clear enough about {} and ∩?

There is the example in the post (the one I said you can skip). Basically, you just need to be able to talk about probabilities of every outcome to be consistent, but you can always make a probability space that does (as people usually do). It’s only in rare cases that you would choose an event space that does anything different.

In many cases though, we do not care about the events corresponding to a single outcome, as the probability of exactly one particular outcome is zero. It is defined, but it is zero. These are usually continuous distributions. In these cases, talking about the probabilities of single outcomes is not helpful, so I thought it would be important to point out that it is not needed first (before the next post).

For example, if you have a normal distribution, the probability of getting something between -1 and 1 standard deviations is about 0.68, between +2 and +3 is about 0.02 etc. But, the probability getting some number, say 1, exactly 1, is 0 – this is because there are an infinite number of outcomes and their total probability is one.

More formally, in these cases we would say: if x is a member of the sample space (an outcome) then the event {x} has probability measure zero: μ({x}) = 0.

• The {} and ∩ was clear, I meant many other supposedly introductory texts forget to explain basic notation. Good job not forgetting.

Ah, continuous distributions, got it.