The is a follow up from Nathaniel’s post. One of the ways that the probabilities of probabilities can be used is in asking what experiments would be best for a scientist to do. We can do this because scientists would like to have a logically consistent system that describes the world but make measurements which are not completely certain – the interpretation of probability as uncertain logic is justified.
Lets make a probabilist model of scientific inquiry. To do this, the first component we need is a model of “what science knows”, or equally, “what the literature says”. For the purposes here, I will only consider what science knows about one statement: “The literature says X is true”. I’ll write this as and its negation as . This is a really minimal example.
Of course, what the literature says is determined by someone who is reading it, not absolutely. In this example, this person is the scientist who is deciding what experiments to do. The scientists state of knowledge about the literature can be written . It is a distribution over the probability representing what the literature thinks. In this situation, it doesn’t matter what the literature actually says, just what the scientist thinks that the literature says.
The scientist can choose to do experiments and make a contribution to the literature, but what experiment should they choose? I reckon, the one which provides the most information to the literature. Lets take two experiments and consider what the scientist thinks about what the literature will say after the experiment is added: represented by the probability densities and .
We can measure how much information the experimenter expects will be added to the literature. Mathematically, the quantity of interest is the expectation of the Kullback-Liebler divergence (information gain) from what the scientist thinks the literature says now, to what the scientist thinks the literature will say after an experiment. Comparing (dropping the and for clarity: and . (Note: , the latter refers to what the scientist thinks the literature said before experiment A, in hindsight, after A is performed.)
These equations are a bit complex, I’ll break it down. The information gain () is minimal at and is convex. Here is some example curves:
These show that greatest information gain is when the experiment changes what the literature thinks the most. The most information is where the change is in opposition to the current state, but all change is good.
Consider a topic where the scientist is completely sure that the literature does not say anything about X: ( is the dirac delta, a all probability is at p(X)=0.5). There are two experiments, A: the scientist thinks that it will confirm or reject X almost conclusively with equal chance, and B: the scientist thinks that it will confirm X to the same degree, or not say anything with equal chance, . Some simple algebra will show that the expected information gain will be twice as much for A, the one that will confirm/reject. It seems to work fairly intuitively and is quite a general procedure. I’ll leave it to you to try other examples (of which there are many).
As the information gain is always positive, this applies to the expectation as well. If there is no information gain there is no point in doing any experiments (this is only possible when the scientist thinks the experiment wont show anything). The question of falsifiability is related. If the scientist thinks that there is no experiment that will cause a lower , there still might be an informative experiment (see the example above). However, if the scientist thinks that there is no experiment that has the potential to change , then they shouldn’t bother doing anything.
The really important thing about this is that there is no notion of scientific truth or falsehood: just a subjective scientist trying to reason the best they can about how to inform their community (the literature) about something.
EDIT: Also, notice that the biggest gain in information is when the literature is made to “change its mind” on something.