Paradoxes of probability theory: the two envelopes

by Nathaniel Virgo

This post is about a classic probability puzzle. It goes something like this: I place two envelopes on the table in front of you. One of them contains a Prize, which is an amount of money in pounds, but you don’t know how much it is. The other one contains a Special Bonus Prize, which is worth exactly twice as much money as the Prize. It’s your lucky day — but you can only choose one envelope. Which do you choose?

“Well,” you say to yourself, “it doesn’t matter, they’re both the same,” so you pick one at random. Let’s say it’s the one on the left. But now I ask you if you want to change your mind.

“Well,” you might say to yourself, “let x be the amount of money in the envelope I’m holding. This envelope has a 50% chance of being the Prize, in which case the other envelope contains 2x. On the other hand, there’s a 50% chance that this is the Special Bonus Prize, in which case the other envelope contains 0.5x. But still, the expected value of the other envelope is 0.5*2x + 0.5*0.5x = 1.25x. So on the balance of probabilities I should definitely switch.” But then I offer to let you switch again, and again, and again, and every time you go through the same reasoning, never managing to settle on a particular envelope because each one seems like it should contain more money than the other.  Clearly something is wrong with this reasoning, but what is it?

In this post, I’ll solve this problem in what I consider to be the proper Bayesian way, pinpointing exactly where the problem is.  You might want to think about the question for a bit and come up with your own idea of its solution before reading on.

One thing to note before we start is that the problem goes away if I tell you how much money the Prize is worth. For example, if the Prize is £10 and the Bonus Prize is £20, the reasoning goes like this: If the envelope in my hand is the Prize then the other envelope is worth £20, otherwise the other envelope is worth £10, so its expectation is £15. That’s the same as the expected value of the envelope in your hand, so there’s no problem. So whatever the difficulty is, it has something to do with the idea that the value of the Prize is unknown.

How can we model this problem using probability theory? I decided to use three jointly distributed random variables: A, which represents the value of the envelope on the left (call it envelope A); B, which represents the amount of money in the other envelope; and E, which can take on two values, a or b. The variable E represents whether envelope A or envelope B contains the Special Bonus Prize.

We have to come up with a prior distribution for these three variables. In order to make them represent the things they’re supposed to represent, this distribution must have the following properties:

  • The marginal distributions must be the same for the amount of money in the two envelopes. That is, p(A=x) = p(B=x), for every x.
  • Each envelope should have a 50/50 chance of containing the Bonus Prize: p(E=a) = p(E=b) = \frac{1}{2}.
  • The Bonus envelope should contain twice as much money as the Prize envelope: p(A=2B|E=a)=1 and p(B=2A|E=b)=1.
  • The joint distribution shouldn’t change if you swap the two envelopes: p(A=x, B=y) = p(B=x, A=y).

These aren’t necessarily independent, since the last one implies the first one and possibly the first three imply the last one (I’m not sure), but they’ll all be used below.

Now, with this notation, let’s go through the paradoxical argument again. I’ve marked where the problem is.

  1. Envelope A has a 50% probability of being the Special Bonus envelope. p(E=a)=\frac{1}{2}, or p(B=2A)=p(B=\frac{1}{2}A) = \frac{1}{2}.
  2. (The incorrect step.) Let the contents of A be x. Then, from (1), envelope B contains 2x with probability 0.5, and 0.5with probability 0.5. That is, p(B=2x | A=x) = \frac{1}{2} and p(B=\frac{1}{2}x | A=x) = \frac{1}{2}, for any given x.
  3. Therefore the expected value of B is equal to \frac{1}{2}(2x+\frac{1}{2}x)=\frac{5}{4}x, where x is the value of envelope A.
  4. Therefore the expected value of B is greater than the expected value of A, and I should switch
  5. By symmetry I should also switch if I choose B, so by induction I should keep switching forever and become infinitely rich.

Step 2 is the problem. It’s true that p(B=2A)=p(B=\frac{1}{2}A) = \frac{1}{2}, but step 2 (and hence the whole argument) hinges on  claiming that this is still the case when conditioned on the actual value of A. That is,

p(B=2x | A=x) = p(B=\frac{1}{2}x | A=x) = \frac{1}{2}

for every x. But in fact p( B=2x | A=x ) is given by

p( A=x | E=a )\frac{p( E=a )}{p( A=x )} = \frac{p( A=x | E=a )}{2p( A=x )},

whereas p( B=\frac{1}{2}x | A=x ) is given by

p( A=x | E=b )\frac{p( E=b )}{p( A=x )} = \frac{p( A=x | E=b )}{2p( A=x )}.

Thus p( B=2x | A=x ) can only be equal to p( B=\frac{1}{2}x | A=x ) if p( A=x | E=a ) = p( A=x | E=b ), for every x. That is, step 2 only works if A is conditionally independent of E.

In other words, it only works if knowing whether A is the bonus prize tells you nothing about how much money is in it. This should raise a red flag already — if I told you that the envelope you’d selected was, in fact, the bonus prize, it would be quite strange if you didn’t then expect it to contain more money.

But an argument one sometimes hears in regards to this thought experiment is that if you know nothing — literally nothing — about the value of A, then this will in fact be true. Let’s try to codify this idea mathematically and see where it leads us.

Now, it’s quite difficult to say, a priori, what it really means to “know nothing” about something in probability theory. There’s a whole theory of so-called ignorance priors in Bayesian probability theory, but they’re quite fiddly and subtle things, so I’m not going to start out by trying to construct one. Instead I’ll just accept the claim in the previous paragraph (that knowing nothing means A is conditionally independent of E) and see where it leads.

Now, with this assumption of conditional independence, we have that

p(A=x) = \frac{1}{2}(p(A=x | E=a) + p(A=x | E=b)) = p( A=x|E=a ) = p(A=x | E=b).             (i)

But p( A=x | E=a ) = p( B = \frac{1}{2}x | E=a ). By similar reasoning to (i), this is also equal to p(B = \frac{1}{2}x), and by exchange of A and B this is equal to p(A = \frac{1}{2}x).

So we have that this particular notion of “literally not knowing anything” implies that the marginal prior for A has the property that

p( A=x ) = p( A=\frac{1}{2}x ),

for every x. You can construct various fancy priors that have this property, such as p(A=x) \propto 1+\sin(2\pi\log_2(x)), but the one that looks most like an ignorance prior is the uniform prior. Uniform priors for unbounded quantities are a bit odd and have a few formal subtleties, but you can deal with them. They’re improper priors, meaning that you can’t normalise them, but essentially, this ignorance prior assigns the same infinitesimal probability density to every value of x.

Using such a uniform prior for an amount of money is a bit weird. It means, for example, that the probability that the envelope contains £10 is the same as the probability that it contains £10^{10^{100}}, which is the same as the probability that it contains £10^{-1000}. But more than that, it means that the expected amount of money in the envelope is infinite. (This is also true of the “fancy” priors mentioned above.)

So now, finally, we can fully explain the paradox. If you really really “didn’t know anything” about the amount of money in the envelopes, you might be justified in assigning a uniform marginal prior to the value of each envelope’s contents. Then when you select envelope A, you can ask yourself “what is the expected amount of money in this envelope?” The answer is infinity. Should you switch? Well, the expected amount of money in envelope B is infinity, which is equal to infinity*5/4. So there’s no paradox. It doesn’t matter if you switch or not, because you’ll expect to become infinitely rich in any case.

But of course, if this was a real situation then the uniform prior would be a rather silly one. The contents of my envelopes cannot be less than 1p, and they can’t be more than the total amount of money in existence, which I guess is in the trillions of pounds. (Of course, you’re free to pick a smaller upper bound if you want.) Any prior you can come up with that fulfils these constraints will have finite expectations, and won’t allow you to conclude step 2 from step 1 in the argument above.

So to conclude, the two envelopes paradox is not a paradox at all, but just an intuitively reasonable argument that has a hard-to-spot error. (Confusing a conditional distribution with an unconditional one.) If you work through the problem in a proper Bayesian fashion, you realise that you can’t avoid considering your prior knowledge of the envelopes’ contents. As long as you choose a sensible prior, the problem evaporates.

About these ads

11 Comments to “Paradoxes of probability theory: the two envelopes”

  1. An editorial note: this post come from a discussion thread on the previous post (http://jellymatter.com/2012/09/24/more-wrong-interpretations-of-p-values-repeated-sampling/), where Lucas said this paradox had something to do with ancillary statistics. I was preparing a reply along the lines of “I don’t see how that has anything to do with it – to my mind the resolution to the paradox is this…”, when it became long enough to make into a full post. I’d still like to know what, if anything, it has to do with ancillary statistics.

  2. Pretty much agreed!

    Although what about the case where you are just trying to get the highest number you can, of two numbers written on pieces of paper, but with the rules otherwise as above. Then there is no such limit as the amount of money in the world.

    With the same proviso that anyone interested might want to think about this additional issue, before reading on….

    IMHO there is no problem, for the reasons Nathaniel gives, for *any* *physical* instance of the problem. e.g. you can’t encode arbitrarily large numbers in any finite physical encoding system.

    And for me, that is enough to make the problem (fully) go away.

    Some others might be more Platonist, and think there is still a problem in the maths itself (in some sense or other), which cannot be solved simply by saying that the problem could not arise in any physical instantiation of the situation. (I would, I suppose, want to disagree that there is any sense to be made of the idea of the maths itself, above and beyond the physical situations it abstracts over.)

    I don’t know if you would agree with some, any or none of the above, Nathaniel?

    • I agree, absolutely, with all of that.

      I would add, though, that if you were being Platonist about it and decided to invent an imaginary game where any number at all can be in the envelope, then I suspect there still isn’t a really big problem. At least, as far as my intuition goes, it seems quite permissible to assign an unbounded uniform prior in that case, if you want to. If you do that then it is indeed true that, conditional on any finite value of A the expected value of B is 1.25 times greater, /and/ that conditional on any finite value of B, the expected value of A is 1.25 times greater. But the unconditional expected values of A and B are both infinite, and that probably makes it OK. Effectively, choosing envelope A and expecting its contents to have a finite value is an inconsistent thing to do, and that’s why you get a sort-of-inconsistent result. If you do manage to get into that state of knowledge, you have to keep swapping (for infinity) until you no longer believe that A’s contents are likely to be finite. (I don’t know how to make that argument rigorous, but I think it could be done.)

  3. So, the way to resolve it using the ancillary/sufficient statistic distinction is to say that the only thing I care about, the statistic sufficient for my decision, is whether one envelope contains more than the other, and simply ask what is the probability of the statistic s = [[A > B]], ignoring the magnitude of their difference or what their actual values are. Then it doesn’t matter what the prior over these ancillary statistics are, as they don’t feature in my decision at all. The difference between the approaches is that here, if it doesn’t matter, we don’t include it, whereas, for the approach above, the desire (platonic desire ;) ) is to include it anyway. If it does matter, then we would include it and basically do what Nathaniel describes.

    • That’s fair enough I guess. If I was actually in this situation I would simply say “I have no reason to believe one envelope contains more money than the other” and pick one at random. But I wouldn’t really call it a resolution of the paradox, because it doesn’t explain why the naïve argument in favour of switching is wrong.

      • It explains it by asking why on earth one should be asking about the expected values in the envelopes in the first place.

      • You’re right, there’s absolutely no reason to consider the conditional expectations in order to solve this problem. I was just trying to show that you *can* consistently consider them if you want to.

        It occurs to me that this might be a much better puzzle if it went like this: I let you open the envelope (revealing an amount of money, say £10, but you don’t know whether it was the prize or the bonus prize) and *then* offer to let you switch. That way you would actually need to consider the expectations.

  4. Ok, just one thought, when you say the problem goes away if you know the prize money, (e.g. £10 and £20 pounds) because the expected value of the money in the other envelope is e.g. £15, you don’t have to know the prize money, you just have to allow basic algebra without assigning probabilities to everything.

    Say the prize money is M, then the super prize money is 2M. The expected value in the “other” envelope is 1.5M as is the expected value in the chosen envelope. So long as you don’t have what I think Lucas is calling “platonic desire” to put a probability distribution on M (or something related to it, you haven’t used the variable M at all in your analysis), then there’s no problem. Maybe I’m just repeating your last two comments in a different way though…

    Note though that you can put an expected value (probability distribution) on the prize money in a given envelope as a function of M (that doesn’t mean you are putting an expected value on M!)

    E[ amount of money in chosen envelope ] = 1.5M

    E[ M ] = meaningless

    • Sure, that’s absolutely fine. It’s a correct way to do the calculation without making a mistake, as is Lucas’ argument based on symmetry between the envelopes. But I can’t stress strongly enough that **the point of this exercise is to explain why the naïve argument is wrong**. You can’t do that just by giving a different argument that doesn’t make the same mistake – you have to actually point out where the mistake is.

      The mistake in the original argument is to say “if I know nothing about the amount of money in the other envelope then 0.5x is just as likely as 2x”. I can’t think of a way to explain what’s wrong with that without exploring what it really means to “know nothing” about a quantity, and why that does or doesn’t make sense in particular situations. There may be other ways to mathematically formalise these notions, but probability theory is the only one in common use and the only one I know, hence the introduction of a Bayesian prior distribution in order to help explain the problem. It has nothing to do with Platonism, since it has nothing to do with the ontological existence of anything – it’s about a logical tool that can be used to clarify the problem and make its solution explicit. Note that the joint distribution p(A,B) is a Bayesian prior and does not have to correctly represent the “actual” potential for variation in the amounts of money in the envelopes in order for the argument to work. The probability distribution exists purely in the realm of the contestant’s reasoning about the problem, and not at all in the world or in any kind of ontologically existing Platonic realm.

      In particular, *if* you choose a prior such that “if I know nothing about the amount of money in the other envelope then 0.5x is just as likely as 2x” is true, *then* E[ M ] is not meaningless but infinite, and this is at the root of the problem. This is not Platonism but just logic – the one statement implies the other, as I showed in the post.

  5. Ok sure, but my explanation of what it means to “know nothing” about the value of the money of the envelopes is to say that I know that there is a value of the money, but I don’t know the numerical quantity, so I assign it a letter M. That’s it, I dont assign a distribution to M so the expectation of it is meaningless. This formalises the problem (of how to represent my uncertainty about M) mathematically in a way that is

    a) In common use
    b) You know about it
    c) Isn’t probability

    Because it’s just elementary algebra.

    I can’t speak for Lucas, “platonic desire” was his phrase, but it is about the ontological existence of “things which I don’t know what they are, but I know they are something, and I can’t quantify my uncertainty with probability (even as an “improper” prior)” The mistake, in my view, is to phrase the problem in a way that denies the existence of such things.

    That said, I see your point that if you do take the Bayesian option, then clearly the way to resolve it is to use some kind of prior.

    • What you say is correct, but it doesn’t explain why it’s *not* correct to say “if I know nothing about the amount of money in the other envelope then 0.5x is just as likely as 2x”, which is what I was trying to address in my post. That statement can’t really be addressed without using probability theory, since “just as likely” is (pretty much) a statement about probabilities.

      I didn’t say anything against or in favour of the ontological existence of “things which I don’t know what they are, but I know they are something, and I can’t quantify my uncertainty with probability (even as an “improper” prior)”. What I said was that *if* you make the claim that “if I know nothing about the amount of money in the other envelope then 0.5x is just as likely as 2x” *then* you are not dealing with a “thing which you don’t know what it is, but you know it is something, and you can’t quantify your understanding with probability (even as an “improper” prior)”; instead you are implicitly using the (improper) uniform prior, and that’s where the problem arises. Your approach avoids this mistake by avoiding assigning a probability distribution, which is fine, and good, and desirable; but it won’t help someone who’s made that mistake to understand where they went wrong.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 162 other followers

%d bloggers like this: