A while ago I wrote a little rant on the (mis)interpretation of P-values. I’d like to return to this subject having investigated a little more. First, this post, I’m going to point to an interesting little subtlety pointed out by Fisher that I hadn’t thought about before, in the second post, I will argue why P-values aren’t as bad as they are sometimes made out to be.
So, last time, I stressed the point that you can’t interpret a P-value as a probability or frequency of anything, unless you say “given that the null hypothesis is true”. Most misinterpretations, e.g. “the probability that you would accept the null hypothesis if you tried the experiment again”, make this error. But there is one common interpretation that is less obviously false: “A P-value is the probability that the data would deviate as or more strongly from the null hypothesis in another experiment, than they did in the current experiment, given that the null hypothesis is true”. This is something that you might think is a more careful statement, but the problem is that in fact when we calculate P values we take into account aspects of the data not necessarily related to how strongly they deviate from the prediction of the null hypothesis. This could be misleading, so we’ll build it up more precisely in this post.
The UK government’s (ever so slightly creepily named) “Behavioural Insights Team” released a report [PDF] (relatively) recently called “Test, Learn, Adapt” (the authors include Ben Goldacre, well known for the book “Bad Science”, and the director of the York Trials Unit, David Torgerson) arguing that more policy decisions should be made on the basis of evidence from randomised controlled trials (RCTs). The report is a really good plain-English explanation of what RCTs are and how they work. It also gives examples of how RCTs can perhaps help to inform policies, by testing whether interventions such as back-to-work schemes or educational programs, um, “work”. According to the report’s blurb:
RCTs are the best way of determining whether a policy or intervention is working.
It’s not hard to find opinion pieces backing up the report’s central idea, and the thesis that RCTs are the best way to “find things out”. Here’s one by Tim Harford, a writer who covers economics; a similar argument made by Paul Johnson who is the director of an economics research group, the Institute for Fiscal Studies; and Prateek Buch, who is a research scientist. A phrase that keeps popping up is “gold standard”. RCTs are “the gold standard in evidence”, says Johnson, or the “gold-standard for showing that medical interventions are effective” according to Buch. Mark Henderson’s book, “The Geek Manifesto” says that the RCT is “commonly considered the ‘gold standard’ for medical research because it seeks systematically to minimise potential bias through a series of simple safeguards”. What exactly does all this mean? I think it’s a question worth asking, since not all science involves RCTs. The Higgs boson for example, was recently “discovered” (if that’s the word) without (as far as I can tell) the need to randomise test subjects. So are RCTs in fact the “gold standard”?
I’ve read a couple of interesting books recently, one was “The End of Science” by John Horgan, and the other was “Radical Embodied Cognitive Science” by Anthony Chemero. Horgan’s theme was the question of whether the fundamentals of science are now so solid that before long nothing genuinely “new” will be left to find, and science will be reduced to either obsolescence, or puzzle-solving type application of existing theories to particular problems. The only other type of science that still exists, according to Horgan, is “ironic” science. A kind of semi-postmodern project to explain or describe what we already know in more “beautiful” or appealing forms, but which never produces hypotheses that are empirically testable, and for this reason, don’t actually advance knowledge. Horgan is distinctly dismissive of this kind of science, as being not “proper” science (he deliberately compares it to postmodern literary criticism, which he seems to have particular contempt for, having once been a student of it himself). Chemero would be, I’m sure, classified by Horgan as an ironic scientist. I don’t think Chemero would be able to deny that in a sense, his philosophy is empirically untestable, but he certainly argues that it is pragmatic in the sense of being useful to scientists engaged in solving real world problems.
Jellymatter is, we claim, not afraid of equations, but apparently scientists are. A study in PNAS claims to have found that theoretical biology papers are cited less when they are densely packed with mathematical language. The authors argue that this impedes progress, since empirical work needs to be backed up and commensurate with some theory to have deeper scientific meaning.
Recently a portion of Jellymatter was involved in running some robot-building workshops with kids at Hove museum. We built some simple Braitenberg vehicles (basic light-following robots) and played some fun games. Hopefully we’ll get time to add some more details about that later, but in the meantime, I made a simulator of one of the games, which you can find along with a fuller explanation here.
Imagine a circuit that causes a little light to flash on and off. Imagine that the frequency of that flashing is itself dependent on the light at a particular sensor. Imagine that such a circuit is placed next to another identical circuit, such that the light from each circuit is directed at the sensor on the other circuit. What do you expect to see? Find out after the break…
Whilst hyperflunking across the interdimensional quantum vibration matrix, your spaceship detects three jumbled up signals. They sound like random noise, but you suspect they are in fact secret messages from Glycerol Soap Bomb, the ruler and Maximal Liapunov Exponent of the planet Cholesky Decomposition. The messages can be downloaded from the following locations:
We’ve touched on the difference between chaos and randomness before. One strange property of chaotic systems is that they are able to synchronise to each other, so that in spite of their intrinsic tendency to vary wildly, a chaotic system can (actually quite easily) be persuaded to match the behaviour of another chaotic system. As this post will show, it is possible to use this property for a kind of secret message transmission.