## An interesting relationship between physics and information theory

Lately I’ve been hanging out on Physics Stack Exchange, a question-and-answer site for physicists and people interested in physics. Someone asked a question recently about the relationship between thermodynamics and a quantity from information theory.  It lead me to quite an interesting result, which I think is new.

The question was

“I am facing with the concept of cross entropy. I would like to know the thermodynamic and statistical meaning of cross entropy (if exists)?” — Physics Stack Exchange user emanuele

I found the question interesting and did a few calculations.  I couldn’t find a nice thermodynamic meaning for the cross entropy, but I did find one for a related quantity called the Kullback-Leibler divergence. As far as I know, this is a new result.

The cross entropy between two probability distributions is defined as

$H(P, Q) = -\sum_i p_i \log q_i.$

These two probability distributions should both refer to the same set of underlying states. Normally in thermodynamics we think of a system only having one probability distribution, which represents (roughly) the range of possible states the system might be in at the present time. But systems can change over time. So let’s imagine have a system (with constant volume) that’s initially in equilibrium with a heat bath at a temperature $T_1$. According to the usual principles of statistical mechanics, its state can be represented by the probability distribtion

$p_i = \frac{1}{Z_1} e ^{-\beta_1 u_i},$

where $\beta_1=1/T_1$ (I’ve set Boltzmann’s constant equal to 1 for clarity) and $Z_1$ is a normalisation factor called the partition function. The $u_i$ are the energy levels of the system’s permitted microscopic states.

Now let’s imagine we pick up our system and put it in contact with a heat bath at a different temperature, $T_2$, and let it come to equilibrium again. Since no work has been done, all the $u_i$ values will be unchanged and we’ll have a new distribtion that looks like this

$q_i = \frac{1}{Z_2} e ^{-\beta_2 u_i},$

where $\beta_2 = 1/T_2$.

Now we can do a bit of algebra to find the cross-entropy:

$H(P,Q) = \frac{1}{Z_1}\sum_i e^{-\beta_1u_i}(\beta_2u_i + \log(Z_2)) = \log(Z_2) + \frac{1}{Z_1}\sum_i e^{-\beta_1 u_i}\beta_2 u_i$

$= \log(Z_2) + U_1 \beta_2.$

It’s a standard result from statistical mechanics that

$S_2 = H(Q) = \log(Z_2) + U_2\beta_2.$

Solving this for $\log(Z_2)$ and substituting into the cross entropy formula we have

$H(P,Q) = S_2 - \beta_2(U_2 - U_1) = S_2 - \frac{U_2 - U_1}{T_2}.$

Physicists are, generally speaking, afraid of any quantity that has entropy units, and if they see one they like to multiply it by a temperature in order to make it look like an energy. If we multiply this by $T_2 = 1/\beta_2$ we get

$T_2H(P,Q) = T_2S_2 - \Delta U.$

It’s possible that this might have a nice thermodynamic interpretation in terms of something like the maximum amount of work that we can extract from doing this transformation under a particular set of circumstances — but if it does then I haven’t seen it just yet. The expression looks tantalisingly like a change in free energy ($latex \Delta U – T\Delta S$), but it’s not quite the same.

However, we can get a much more interesting result if we note that in information theory, the Kullback-Leibler divergence (aka information gain) is often seen as more fundamental than the cross entropy. The KL-divergence is defined as

$D_{KL}(P\|Q) = \sum_i p_i \log\frac{p_i}{q_i} = H(P,Q)-H(P),$
which in our case is equal to

$S_2 - S_1 - \frac{U_2-U_1}{T_2} = \Delta S - \Delta U/T_2.$

This is much more interesting than the result for the cross-entropy, because it does have a clear thermodynamic interpretation. When we put the system in contact with the second heat bath, its entropy changes by $latex \Delta S$, and the entropy of the heat bath changes by $latex -\Delta U/T_2$. (This is because entropy is heat divided by temperature – an amount $latex \Delta U$ leaves the system, so $latex -\Delta U$ enters the heat bath.) So the KL-divergence is just the total change in entropy after we put the system in contact with the new heat bath. I’m quite excited about this because I didn’t know it before, and I don’t think anyone else did either!

We can even take this a bit further. Let’s imagine putting a heat engine in between the system and the second heat reservoir. So we’ll try to extract some useful work from the flow of heat that takes place as the system and the heat bath equilibriate. If we do this the total change of entropy becomes $latex \Delta S + (-\Delta U – W)/T_2$. This has to be greater than 0, which means that $latex W\le T_2\Delta S – \Delta U$.

Now, if we do that physicist thing of multiplying $latex D_{KL}$ by $latex T_2$, it becomes $latex T_2\Delta S – \Delta U$, which is the value for the maximum work that we just calculated. So while the thermodynamic meaning of the cross-entropy isn’t clear to me, the KL-divergence does seem to have a nice interpretation in terms of work.

(My apologies for some of the LaTeX code not being rendered properly. I don’t know why it’s not working.)

### 9 Comments to “An interesting relationship between physics and information theory”

1. Couldn’t fix the TeX either. This result seems fairly intuitive (a good thing). Perhaps it can be derived from the joint entropy under MaxEnt conditions.

• Which joint entropy do you mean?

• $\sum_i p_i \log q_i$

• as opposed to $\sum_i\sum_j p(x_i,y_j) \log p(x_i,y_j)$

I meant cross entropy.

• Ah, I see. Well, MaxEnt is implicit in all of this, since it’s where those Boltzmann distributions come from. I’ve done a bit of thinking about what all this means and how to generalise it – I’ll write it up when I get a chance.

One thing to note is that P doesn’t actually need to be an equilibrium distribution – you can substitute any expression you want for p_i and it still works. Q has to be the MaxEnt distribution. There’s an interesting connection to the probability of fluctuations as well. It’s all quite intriguing.

Nathaniel. (Not logged in as I’m posting from my phone)

2. I hope you’ll pardon my intruding with this, but how did you get the LaTeX to render at all? I’d like to include some mathematics over on my blog, but I’m hesitant to commit to anything that requires too much work on my part.

• Hi Joseph

If it’s a standard wordpress.com blog I don’t think you have to do anything special to make it work. Just type the formulas like the ones that didn’t render properly above – i.e. start with “\$latex “, then type your LaTeX code, then another dollar sign.

• I think nebusresearch.wordpress.com is almost exactly the same setup as here
$\\text{latex}\;\; \text{e\^{ }i} \backslash \text{pi = -1}\$
$e^{i\pi} = -1$

• I’m surprised by this and, soon as I get the chance to try it out, possibly delighted. Thank you.