The relationship between probability and information is interesting and fun. The table below is a work in progress, but I think it’s kind of cool already. The idea is to compare information quantities with the probabilities of unlikely events. For instance, if I flipped a coin for every bit on my hard drive and they all came up heads, it would be pretty unlikely. But what else would be that improbable?
Everybody loves chaos theory. Chaotic systems are unpredictable, in that a small change in their initial conditions will result in very large changes to their behaviour after a little time. But this doesn’t mean they’re random. It can often be quite easy to tell the difference.
For instance, here’s a list of 30 numbers created using the so-called logistic map (, for some parameter r between 0 and 4), which is a classic example of a simple chaotic system:
0.3, 0.777, 0.6411027, 0.851333103795, 0.468290685658, 0.921279721721, 0.268336565448, 0.726428596439, 0.735301335645, 0.720143141342, 0.745686890084, 0.701660422552, 0.774532373712, 0.646138310401, 0.845981298662, 0.482098681611, 0.92381430836, 0.260411298509, 0.712609840236, 0.757749106588, 0.679191972796, 0.806193876476, 0.578107647032, 0.902427023258, 0.325794216521, 0.812713676509, 0.56317757914, 0.910231795928, 0.302326532356, 0.780423240701.
These numbers look pretty random at first glance, although there’s quite a lot that begin with 0.7. But let’s try doing something slightly unusual: we’ll plot x(t) against x(t+1). What do we get? We get this:
Well, it’s not exactly, but I thought I’d be argumentative. Here is the problem with entropy and disorder as I see it, possibly somewhat different to Nathaniel.
There are two things that are commonly called entropy, one of which is a specific case of the other. These two types of entropy are physical/thermodynamic entropy and statistical entropy. Thermodynamic entropy is a statistical entropy applied specifically to physical microstates. As physicists generally agree on their definition of the microstates, thermodynamic entropy is well defined physical quantity. Statistical entropy on the other hand can be applied to anything that we can define a probability measure for.
The is a follow up from Nathaniel’s post. One of the ways that the probabilities of probabilities can be used is in asking what experiments would be best for a scientist to do. We can do this because scientists would like to have a logically consistent system that describes the world but make measurements which are not completely certain – the interpretation of probability as uncertain logic is justified.
Lets make a probabilist model of scientific inquiry. To do this, the first component we need is a model of “what science knows”, or equally, “what the literature says”. For the purposes here, I will only consider what science knows about one statement: “The literature says X is true”. I’ll write this as and its negation as . This is a really minimal example.
The second law of thermodynamics — the law of entropy — is a fascinating thing. It’s the law that makes the past different from the future; it’s the law that predicts an effective end to the Universe, yet it’s the law that makes life possible. It’s also a deeply mysterious law. It took well over a century for true meaning of entropy to be understood (in fact arguments on the subject still rage today), and we still don’t understand, on a cosmological level, exactly why it was so low in the past.
One of the things that’s often said about entropy is that it means “disorder”. This post is about that idea. It’s worth discussing for two reasons: firstly, it’s wrong. It’s close to the truth, in the same sort of way that a spectrum is close to a rainbow, but not the same. Secondly, the real truth is much more interesting.
Most of us who’ve studied probability theory at University level will have learned that it is formalised using the Kolmogorov axioms. However, there is an interesting alternative way to approach the formalisation of probability theory, due to R. T. Cox. You can get a quick overview from this Wikipedia page, although it doesn’t really motivate it very well, so if you’re interested you’re much better off downloading the first couple of chapters of Probability Theory: The Logic of Science by Edwin Jaynes, which is an excellent book (although sadly an incomplete one, because Jaynes died before he could write the second volume) and should be read by all scientists, preferably while they’re still impressionable undergraduates.
For Cox, probability theory is nothing less than the extension of logic to deal with uncertainty. Probabilities, in Cox’s approach, apply not to “events” but to statements of propositional logic. to say p(A)=1 is the same as saying “A is true”, and saying p(A)=0.5 means “I really have no idea whether A is true or not”. A conditional probability p(A|B) can be thought of as the extent to which B implies A.
There are a couple of interesting differences between Cox’s probabilities and Kolmogorov’s. Cox’s is more general, but also less formal (people are still working on getting it properly axiomatised). One important difference is that in Cox’s approach a conditional probability p(A|B) can have a definite value even when p(B)=0 (this can’t happen in Kolmogorov’s formalisation because, for Kolmogorov, p(A|B) is defined as p(AB)/p(B)). This means that, unlike the logical statement , the probabilistic statement p(A|B)=1 doesn’t mean that A is true if B is false. So conditional probabilities are like logical implications only better, since they don’t suffer from that little weirdness.
Anyway, that’s cool but what I really wanted to write about was this: in Cox’s version of probability theory, it’s meaningful to talk about the probability of a probability. That is, you can write stuff like p(p(A|B)=1/2)=5/6 and have it make sense. I’ll get to an example of this in a bit.
I think sometimes we’re too obsessed with optimisation. It’s a product of the industrial revolution, or something, everything can go faster, better and cheaper, we assume, except that we all know it can’t. You have to make compromises, obviously. In economics and engineering, the problem is referred to as Pareto optimality. Basically, if you can’t make something better in one respect without making it worse in another, it is Pareto optimal. A “Pareto improvement”, is a change that achieves what you want: making things better without making anything else worse, a change with no compromise. Policy makers know this (it is not an obscure theory) and are supposed to try and achieve Pareto improvements with the changes they make. The thing is, in a complex environment, getting a genuine Pareto improvement is, I suspect, almost certainly impossible.
I, like lots of people, have a morbid fixation with the state of the nation’s finances at the moment. You often hear arguments about whether it is the debt or the deficit that is the problem. The fact that one is a running total of the other doesn’t mean they are interchangeable. It’s basically the same thing as the difference between your speed and acceleration, over time, one is obviously related to the other, but you can always have one low while the other is high. For many I’m sure it’s not that complicated but some people do seem determine to mix them up.