Monday, September 15, 2014

Nifty papers I wrote that nobody knows about (Part I: Solitons)

I suppose this happens even to the best of us: you write a paper that you think is really cool and has an important insight in it, but nobody ever reads it. Or if they read it, they don't cite it. I was influenced here by the blog post by Claus Wilke, who argues that you should continue writing papers even if nobody reads them. I'm happy to do that, but I also crave attention. If I have a good idea, I want people to notice. 

The truth is, there are plenty of papers out there that are true gems and that should be read by everybody in the field, but are completely obscure for one reason or another. I know this to be true but I have little statistical evidence because, well, the papers I am talking about are obscure. You can actually use algorithms to detect these gems, but they usually only find papers that are already fairly well known. 

In fact, this is just common sense: once in a while a paper just "slips by". You have a bad title, you submitted to the wrong journal, you wrote in a convoluted manner. But you had something of value. Something that is now, perhaps, lost. One of my favorite examples of this sort of overlooked insight is physicist Rafael Sorkin's article: "A Simple Derivation of Stimulated Emission by Black Holes", familiar to those of you who follow my efforts in this area. The article has 10 citations. In my view, it is brilliant and ground-breaking in more than one way. But it was summarily ignored. It still is, despite my advocacy.

I was curious how often this had happened to me. In the end the answer is: not so much, actually. I counted five four papers that I can say have been "overlooked". I figured I would write a little vignette about each of them, why I like them (as opposed to the rest of the world), and what may have gone wrong--meaning--why nobody else likes them.

Here are my criteria for a paper to be included into the list:

1.) Must be older than ten years. Obviously, papers written within the last decade may not have had a significant amount of time to "test the waters". (But truthfully, if a paper does not get some citations within the first 5, it probably never will. )

2.) Must have had fewer than 10 citations on Google Scholar (excluding self-citations).

3.) Must not be a re-hash of an idea published somewhere else (by me) where it did get at least some attention.

4.) Must not be a commentary about somebody else's work (obvious, this one). 

5.) Must be work that I'm actually proud of. 

When going through my Google Scholar list, I found exactly four papers that meet these criteria. 

(Without taking into account criterion 5, the list is perhaps twice as long, mind you. But some of my work is just not that interesting in hindsight. Go figure.)

These are the four papers in the final list:

1. Soliton quantization in chiral models with vector mesons, C Adami, I Zahed (1988)
2. Charmonium disintegration by field-ionization, C Adami, M Prakash, I Zahed (1989)
3. Prolegomena to a non-equilibrium quantum statistical mechanics, C Adami, NJ Cerf (1999)
4. Complex Langevin equation and the many-fermion problem, C Adami, SE Koonin (2001).

I will publish a blog post about one of these each of the coming weeks.

I'll start in reverse chronological order:

Physics Letters B 215 (1988) 387-391. Number of citations: 10 

This is actually my first paper ever, written at the tender age of 25. But it didn't get cited nearly as much as the follow-up paper, which was published a few months earlier: Physics Letters B 213 (1988) 373-375. 

How is this possible, you ask? 

Well, the editors at Physics Letters lost my manuscript after it was accepted, is how it happened! 

You have to remember that this was "the olden days". We had computers all right. But we used them to make plots, and send Bitnet messages. You did not send electronic manuscripts to publishers. These were sent around in large manila envelopes.  And one day I get the message (after the paper was accepted): "Please send another copy, we lost ours". Our triplicates, actually, because each reviewer gets a copy that you send in, of course. I used to keep all the correspondence about manuscripts from these days, but I guess after moving offices so many times, at one point stuff gets lost. So I can't show you the actual letter that said this (I looked for it).  Of course, after that mishap the editorial office used a new "received" date, just so that it doesn't look so embarrassing. And arxiv wouldn't exist for another 4 years to prove my point.

So that's probably the reason why the paper didn't get cited: people cited the second one that was published first, instead. But what is this paper all about?

It is about solitons, and how to quantize them. Solitons were my first exposure to theoretical physics in a way, because I had to give a talk about topological solitons called "Skyrmions" in a theoretical physics seminar at Bonn University in, oh, 1983. Solitons are pretty cool things: they are really waves that behave like particles. You can read a description of how they were discovered by John Scott Russell riding his horse alongside a canal in Scotland, and noticing this wave that just... wouldn't... dissipate, here

Now, there is a non-linear field theory due to T.H.R. Skyrme that has such soliton solutions, and people suggested that maybe these Skyrmions could describe a nucleon. You know, the thing you are made of, mostly? A nucleon is a proton or a neutron, depending on charge. Nuclei are are made from them. Your are all nucleons and electrons really. Deal with it. 

Skyrme incidentally is the one who died just days after I submitted the very manuscript I'm writing about, which started the rumour that my publications are lethal. Irrelevant fact, here. 

Skyrme's theory was a classical one, and so the question arose what happens if you quantize that theory. This is an interesting question because usually, if you quantize a field you create fluctuations of that field, and if these fluctuations were of the right kind, they should (if they fluctuate around a nucleon) describe pions. And voilà: we would have a theory that describes how nucleons have to interact with pions. 

What are pions, you ask? Go read the Wiki page about them. But really, they are the stuff you get if you bang a nucleon and an anti-nucleon together. They have a quark and an anti-quark in them, as opposed to the nucleons, that have three quarks: Three quarks for Muster Mark

Now, people actually already knew at the time what such an interaction term was supposed to look like: the so-called pion-nucleon coupling. But if the term that comes out of quantizing Skyrme's theory did not look like this, well then you could safely forget about that theory being a candidate to describe nucleons. Water waves maybe, just not the stuff we are made out of.  

So I started working this out, using the theory of quantization under constraints that Paul Dirac developed, because we (my thesis advisor Ismail Zahed and I) had stabilized the Skyrmion using another meson, namely the ω-meson. You don't have to know what this is, but what is important here is that the components of the ω field are not independent, and therefore you have to quantize under that constraint.

You very quickly run into a problem: you can't quantize the field because there are fluctuation modes that have zero energy. Indeed, because in order to do the quantization you have to take the inverse of the matrix of fluctuations, these zero modes create a matrix that cannot be inverted (its determinant vanishes). What to do?

The answer is: you find out what those zero modes are, and quantize them independently. It turns out that those zero modes were really rotations in "isospin-space", and they naturally have zero energy because you can rotate that soliton in iso-space and it costs you nothing. I figured out how to quantize those modes by themselves (you just get the Hamiltonian for a spinning top out of that), then project out these zero modes from the Skyrmion fluctuations, and quantize only those modes that are orthogonal to the zero modes. And that's what I proceeded to do. Easy as pie.

And the result is fun too, because the resulting interaction term looks almost like the one we should have gotten, and then we realized that the "standard" term of chiral theory comes out in a particular limit, known as the "strong coupling" limit. Even better, using this interaction I could calculate the mass of the first excitation of the nucleon, the so-called Δ resonance. That would be the content of the second paper, which you now know actually got published first, and stole the thunder of this pretty calculation.  

So what did we learn in this paper in hindsight? Skyrmions are actually very nice field-theoretic objects, and the effective theory (while obviously not the full underlying theory that should describe you, namely the theory of quarks and gluons called Quantum Chromodynamics, or QCD), this approximate theory can give you very nice predictions about low energy hadronic physics, where QCD actually is not at all predictive. Because we can only calculate QCD in the high-energy limit (for example what happens when you shoot quarks at quarks with lots of energy, for example). Research on Skyrmions (and low-energy effective theories in general) is still going on strong, it turns out. And perhaps even more surprising is this: there is now a connection (uncovered by my former advisor), between these Skyrmions and the holographic principle

So even old things turn out to be new sometimes, and old calculations can still teach you something today. Also we learn: electronic submissions aren't as easily lost behind file cabinets. So there is that.

Next up:  Charmonium Disintegration by Field-Ionization [Physics Letters B 217 (1988), 5-8]. A story involving the quark-gluon plasma, and how an old calculation by Cornel Lanczos from 1930 can shed light on what happens to the J/��, when suitably modernized. All of 5 citations on Google Scholar this one got. But what a fun calculation! 

Monday, August 4, 2014

On quantum measurement (Part 4: Born's rule)

Let me briefly recap parts 1-3 for those of you who like to jump into the middle of a series, convinced that they'll get the hang of it anyway. You might, but a recap is nice anyway.

Remember these posts use MathJax to render equations. Your browser can handle this, so if you see a bunch of dollar signs and LaTeX commands instead of formulas, you need to configure your browser to handle MathJax.

In Part 1 I really only reminisced about how I got interested in the quantum measurement problem, by way of discovering that quantum (conditional) entropy can be negative, and by the oracular announcement of the physicist Hans Bethe that negative entropy solves the problem of wavefunction collapse (in the sense that there isn't any). 

In Part 2 I told you a little bit about the history of the measurement problem, the roles of Einstein and Bohr, and that our hero John von Neumann had some of the more penetrating insights into quantum measurement, only to come up confused. 

In Part 3 I finally get into the mathematics of it all, and outline the mechanics of a simple classical measurement, as well as a simple quantum measurement. And then I go on to show you that quantum measurement isn't at all like its classical counterpart. In the sense that it doesn't make a measurement at all. It can't because it is procedurally forbidden to do so by the almighty no-cloning theorem. 

Recall that in a classical measurement, you want to transfer the value of the observable of your interest on to the measurement device, which is manufactured in such a way that it makes "reading off" values easy. You never really read the value of the observable off of the thing itself: you read it off of the measurement device, fully convinced that your measurement operation was designed in such a manner that the two (system and measurement device) are perfectly correlated, so reading the value off of one will reveal to you the value of the original. And that does happen in good classical measurements. 

And then I showed you that this cannot happen in a quantum measurement, unless the basis chosen for the measurement device happens to coincide exactly with the basis of the quantum system (they are said to be "orthogonal"). Because then, it turns out, you can actually perform perfect quantum cloning.

The sounds of heads being scratched worldwide, exactly when I wrote the above, reminds me to remind you that the no-cloning theorem only forbids the cloning of an arbitrary unknown state. "Arbitrary" here means "given in any basis, that furthermore I'm not familiar with". You can clone specific states. Like, for example, quantum states that you have prepared in a particular basis that is known to you, like the one you're going to measure it in, for example. The way I like to put it is this: Once you have measured an unknown state, you have rendered it classical. After that, you can copy it to your heart's content, as there is no law against classical copying. Well, no physical law. 

Of course, none of this is probably satisfying to you, because I have not revealed to you what a quantum measurement really does. Fair enough. Let's get cooking.

Here's the thing:

When you measure a quantum system, you're not really looking at the quantum system, you're looking at the measurement device.

"Duh!", I can hear the learned audience gasp, "you just told us that already!" 

Yes I did, but I told you that in the context of a classical measurement. In the context of a quantum measurement, the same exact triviality becomes a whole lot less trivial. A whole whole lot less. So let's do this slowly.

Your measurement device is classical. This much we have to stipulate, because in the end, our eyes and nervous system are ultimately extensions of the measurement device just as JvN had surmised they would. But even though they are classical, they are made out of quantum components. That's the little tidbit that completely escaped our less learned friend Niels Bohr, who wanted to construct a theory in which quantum and classical systems both had their own epistemological status. I shudder to think how one can even conceive of such a blunderous idea.

But being classical really just means that we know which basis to measure the thing in, remember. It is not a special state of matter. 

Oh, what is that you say? You say that being classical is really something quite different, according to the textbooks? Something about $\hbar\to0$?

Forget that, old friend, that's just the kind of mumbo jumbo that the old folks of yesteryear are trying to (mis)teach you. Classicality is given entirely in terms of the relative state of systems and devices. Oh, it just so happens that a classical system, because it has so many entangled particles, must be described in terms of a basis that is so high-dimensional that it will appear orthogonal to any other high-dimensional system (simply because almost all vectors in a high-dimensional space are orthogonal). That's where classicality comes from. Yes, many particles are necessary to make something classical, but it does not have to be classical. It is just statistically so. I don't recall having read this argument anywhere, and I did once think about publishing it. But it is really trivial, which means there is no way I could ever get it published anyway. Because I will be called crazy by the reviewers.

Excuse that little tangent, I just had to get that off of my chest. So, back to the basics: our measurement device is classical, but it is really just a bunch of entangled quantum particles. 

There is something peculiar about the quantum particles that make up the classical system: they are all correlated. Classically correlated. What that means is that if one of the particles has a particular property or state, its neighbor does so too. They kind of have to: they are one consistent bunch of particles that are masquerading as a classical system. What I mean is that, if the macroscopic measurement device's "needle" points to "zero", then in a sense every particle within that device is in agreement. It's not like half are pointing to 'zero', a quarter to 'one', and another quarter to '7 trillion'. They are all one happy correlated family of particles, in complete agreement. And when they change state, they all do so at the same time. 

How is such a thing possible, you ask? 

Watch. It's really quite thrilling so see how this works.

Let us go back to our lonely quantum state $|x\rangle$, whose position we were measuring. Only now I will, for the sake of simplicity, measure the state of a quantum discrete variable, a qubit. The qubit is a "quantum bit", and you can think of it as a "spin-1/2" particle. Remember, the thing that can only have the state "up" and down", only they can also take on superpositions of these states? If this was a textbook then now I would hurl the Bloch sphere at you, but this is a blog so I won't. 

I'll write the basis states of the qubit as $|0\rangle$ and $|1\rangle$. I could also (and more convincingly), have written $|\uparrow\rangle$ and $|\downarrow\rangle$, but that would have required much more tedious writing in LaTeX. An arbitrary quantum state $|Q\rangle$ can then be written as
Here, $\alpha$ and $\beta$ are complex numbers that satisfy $|\alpha|^2+|\beta^2|=1$, so that the quantum state is correctly normalized. But you already knew all that. Most of the time, we'll restrict ourselves to real, rather than complex, coefficients. 

Now let's bring this quantum state in touch with the measurement device. But let's do this one bit at the time. Because the device is really a quantum system that thinks it is classical. Because, as I like to say, there is really no such thing as classical physics. 

So let us treat it as a whole bunch of quantum particles, each a qubit. I'm going to call my measurement device the "ancilla" $A$. The word "ancilla" is latin for "maid", and because the ancilla state is really helping us to do our (attempted) measurement, it is perfectly named. Let's call this ancilla state $|A_1\rangle$, where the "one" is to remind you that it is really only one out of many. An attempted quantum measurement is, as I outlined in the previous post (and as John von Neumann correctly figured out) an entanglement operation. The ancilla starts out in the state $|A_1\rangle=|0\rangle$. We discussed previously that this is not a limitation at all. Measurement does this:
I can tell you exactly which unitary operator makes this transformation possible, but then I would lose about 3/4 of my readership. Just trust me that I know. And keep in mind that the first ket vector refers to the quantum state, and the second to ancilla $A_1$. I could write the whole state like this to remind you:
but this would get tedious quickly. All right, fine, I'll do it. It really helps in order to keep track of things. 

To continue, let's remember that the ancilla is really made out of many particles. Let's first look at a second one. You know, I need at least a second one, otherwise I can't talk about the consistency of the measurement device, which needs to be such that all the elements of the device agree with each other. So there is an ancilla state $|A_2\rangle=|0\rangle_2$. At least it starts out in this state. And when the measurement is done, you find that
$$ |Q\rangle|A_1\rangle|A_2\rangle\to\alpha|0\rangle_Q|0\rangle_1|0\rangle_2+\beta|1\rangle_Q|1\rangle_1|1\rangle_2.$$
There are several ways of showing that this is true for a composite measurement device $|A_1\rangle|A_2\rangle$. But as I will show you much later (when we talk about Schrödinger's cat), the pieces of the measurement device don't actually have to measure the state at the same time. They could do so one after the other, with the same result!

Oh yes, we will talk about Schrödinger's cat (but not in this post), and my goal is that after we're done you will never be confused by that cat again. Instead, you should go and confuse cats, in retaliation. 

Now I could introduce $n$ of those ancillary systems (and I have in the paper), but for our purposes here two is quite enough, because I can study the correlation between two systems already. So let's do that. 

We do this by looking at the measurement device, as I told you. In quantum mechanics, looking at the measurement device has a very precise meaning, in that you are not looking at the quantum system. And not looking at the quantum system means, mathematically, to trace over its states. I'll show you how to do that.

First, we must write down the density matrix that corresponds to the joint system $|QA_1A_2\rangle$ (that's my abbreviation for the long state after measurement written above). I write this as 
$$\rho_{QA_1A_2}=|QA_1A_2\rangle\langle QA_1A_2|$$
We can trace out the quantum system $Q$ by the simple operation
$$\rho_{A_1A_2}={\rm Tr}_Q (\rho_{QA_1A_2}).$$
Most of you know exactly what I mean by doing this "partial trace", but those of you who do not, consult a good book (like Asher Peres' classic and elegant book), or (gasp!) consult the Wiki page

So making a quantum measurement means disregarding the quantum state altogether. We are looking at the measurement device, not the quantum state. So what do we get?

We get this:
If you have $n$ ancilla, just add that many zeros inside the brackets in the first term, and that many ones in the brackets in the second term. You see, the measurement device is perfectly consistent: you either have all zeros (as in $|00....000\rangle_{12....n}\langle00....000|$) or all ones. And note that you can add your eye, and your nervous system, and what not in the ancilla state. It doesn't matter: they will all agree. No need for psychophysical parallelism, the thing that JvN had to invoke. 

I can also illustrate the partial trace quantum information-theoretically, if you prefer. Below on the left is the quantum Venn diagram after entanglement. "S" refers to the apparent entropy of the measurement device, and it is really just the Shannon entropy of the probabilities $|\alpha|^2$ and $|\beta|^2$. But note that there are minus signs everywhere, telling you that this systems is decidely quantum. When you trace out the quantum system, you simply "forget that it's there", which means you erase the line that crosses the $A_1A_2$ system, and add all the stuff up that you find. And what you get is the Venn diagram to the right, which your keen eye will identify as the Venn diagram of a classically correlated state.
Venn diagram of the full quantum system plus measurement device (left), and only of the measurement device (not looking at the quantum system (right).
What all this means is that the resulting density matrix is a probabilistic mixture, showing you the classical result "0" with probability $|\alpha|^2$, and the result "1" with probability $|\beta|^2$. 

And that, ladies and gentlemen, is just Born's rule: that the probability of quantum measurement is given by the square of the amplitude of the quantum system. Derived for you in just a few lines, with hardly any mathematics at all. And because every blog post should have a picture (and this only had a diagram), I regale you with the one of Max Born:
Max Born (1882-1970) Source: Wikimedia
A piece of trivia you may not know: Max got his own rule wrong in the paper that announced it (see Ref. [1]). He crossed it out in proof and replaced the rule (which has the probability given by the amplitude, not the square of the amplitude) by the correct one in a footnote. Saved by the bell!

Of course, having derived Born's rule isn't magical. But the way I did it tells us something fundamental about the relationship between physical and quantum reality. Have you noticed the big fat "zero" in the center of the Venn diagram on the upper left? It will always be there, and that means something fundamental. (Yes, that's a teaser). Note also, in passing, that there was no collapse anywhere. After measurement, the wavefunction is still given by $|QA_1\cdots A_n\rangle$, you just don't know it.

In Part 5, I will delve into the interpretation of what you just witnessed. I don't know yet whether I will make it all the way to Schrödinger's hapless feline, but here's hoping.

[1] M. Born. Zur Quantenmechanik the Stoβvorgänge. Zeitschrift für Physik 37 (1926) 863-867.

Thursday, July 24, 2014

On quantum measurement (Part 3: No cloning allowed)

In the previous two parts, I told you how I became interested in the quantum measurement problem (Part I), and provided a bit of historical background (Part 2). Now we'll get to the heart of the matter. 

Note that I'm using MathJax to display equations in this blog. If your browser shows a bunch of dollar signs and gibberish where equations should appear, you probably have to figure out how to install MathJax on your browser. Don't email me: I know nothing about such intricacies.

Let me remind you that our hero John von Neumann described quantum measurement as a two-stage process. (No, I'm not showing his likeness.) The first stage is now commonly described as entanglement. This is what we'll discuss here. I'll get to the second process (the one where the wavefunction ostensibly collapses, except Hans Bethe told me that it doesn't) in Part 4. 

For the purpose of illustration, I'm going to describe the measurement of a position, but everything can be done just as well for discrete degrees of freedom, such as, you know, spins. In fact, I'll show you a bunch of spin measurements waaay later, like the Stern-Gerlach experiment, or the quantum eraser. But I'm getting ahead of myself.

Say our quantum system that we would like to measure is in state $|Q\rangle=|x\rangle$. I'm going to use Q to stand in for quantum systems a lot. Measurement devices will be called "M", or sometimes "A" or "B". 

All right. How do you measure stuff to begin with?

In classical physics, we might imagine that a system is characterized by the position variable $x$, [I'll write this as "(x)"] and to measure it, all we have to do is to transfer that label "x" to a measurement device. Say the measurement device (before measurement) points to a default location (for example '0') like this: (0). Then, we'll place that device next to the position we want to measure, and attempt to make the device "reflect" the position:
$$(x)(0)\to (x)(x)$$ 
This is just what I want, because now I can read the position of the thing I want to measure off of my measurement device. 

I once in a while get the question: "Why do you have to have a measurement device? Can't you just read the position off of the system you want to measure directly?" The answer is no, no you can't. The thing is the thing: it stands there in the corner, say. If you measure something, you have to transfer the state of the thing to something you read off of. The variable that reflects the position can be very different from the thing you are measuring. For example, a temperature can be transferred to the height of a mercury column. In a measurement, you create a correlation between two systems. 

In a classical measurement, the operation that makes that possible is a copying operation. You copy the system's state onto the measurement device's state. The copy can be made out of a very different material (for example, a photograph is a copy of a 3D scene onto a two-dimensional surface, made out of whatever material you choose). But system and measurement refer to each other.

All right, so measuring really is copying. And reading this the sophisticated reader (yes, I mean you!) starts smelling a rat right away. Because you already know that copying is just fine in classical physics, but it really is against the law in quantum physics. That's right: there is a no-cloning (or no-xeroxing) theorem, in effect in quantum mechanics. You're not allowed to make exact copies. Ever. 

So how can quantum measurement work at all, if measurement is intrinsically copying?

That, dear reader, is indeed the question. And what I'll try to convince you of is now fairly obvious, namely that quantum measurement is really impossible in principle, unless you just happen to be in the "right basis". This "right basis", basically, is a basis where everything looks classical to begin with. (We'll get to this in more detail later). What I will try to convince you here is that quantum measurement is impossible, if you want a quantum measurement to do what you expect from a classical measurement, namely that your device reflects the state of the system. 

The no-cloning theorem makes that impossible. 

I could stop here, you know. "Stop worrying about quantum measurement", I could write, "because I just showed you that quantum measurement is impossible in principle!"

But I won't, because there is so much more to be said. For example, even though quantum measurements are impossible in principle, it's not like people haven't tried, right? So what is it that people are measuring? What are the measurement devices saying? 

I'll tell you, and I guarantee you that you will not like it one bit.

But first, I owe you this piece: to show you how quantum measurement works. So our quantum system $Q$ is in state $|x\rangle$. Our measurement device is conveniently already in its default state $|0\rangle$. You can, by the way, think about what happens if the measurement device is not pointing to an agreed-upon direction (such as '0') before measurement, but Johnny vN has already done this for you on page 233 of his "Grundlagen". Here he is, by the way, discussing stuff with Ulam and Feynman, most likely in Los Alamos.
Left to right: Stanislaw Ulam, Richard Feynman, John von Neumann
To be a fly on the wall there! Note how JvN (to the right) is always better dressed than the people he hangs out with!

So investigating various possible initial states of the quantum measurement device does nothing for you, he finds, and of course he is correct. So we'll assume it points to $|0\rangle$. 

So we start with $|Q\rangle|M\rangle=|x\rangle|0\rangle$. What now? Well, the measurement operator, which of course has to be unitary (meaning it conserves probabilities, yada yada) must project the quantum state, then move the needle on the measurement device. For a position measurement, the unitary operator that does this is
$$U=e^{iX\otimes P}$$
where $X$ is the operator whose eigenstate is $|x\rangle$ (meaning $X|x\rangle=x|x\rangle$), and where $P$ is the operator conjugate to $X$. $P$ (the "momentum operator") makes spatial translations. For example, $e^{iaP}|x\rangle=|x+a\rangle$, that is, $x$ was made into $x+a$.  The $\otimes$ reminds you that $X$ acts on the first vector (the quantum system), and $P$ acts on the second (the measurement device). 

So, what this means is that 
$$U|x\rangle|0\rangle=e^{iX\otimes P}|x\rangle|0\rangle=e^{ix P}|x \rangle|0\rangle=|x\rangle|x\rangle .$$ 
Yay: the state of the quantum system was copied onto the measurement device! Except that you already can see what happens if you try to apply this operator to a superposition of states such as $|x+y\rangle$:
$$U|x+y\rangle|0\rangle=e^{iX\otimes P}|x+y\rangle|0\rangle=e^{ix P}|x \rangle|0\rangle+e^{iy P}|y \rangle|0\rangle=|x\rangle|x\rangle + |y\rangle|y\rangle .$$
And that's not at all what you would have expected if measurement was like the classical case, where you would have gotten $(|x\rangle + |y\rangle)(|x\rangle + |y\rangle)$. And what I just showed you is really just the proof that cloning is impossible in quantum physics.

So there you have it: quantum measurement is impossible unless the state that you are measuring just happens to already be in an eigenstate of the measurement operator, that is, it is not in a quantum superposition. 

Whether or not a quantum system is in a superposition depends on the basis that you choose to perform your quantum measurement. I do realize that the concept of a "basis" is a bit technical: it is totally trivial to all of you who have been working in quantum mechanics for years, but less so for those of you who are just curious. In everyday life, it is akin to measuring temperature in Celsius or Fahrenheit, for example, or location in Euclidean as opposed to polar coordinates. But in quantum mechanics, the choice of a basis is much more fundamental, and I really don't know of a good way to make it more intuitive (meaning, without a lot more math). A typical distinction is to measure photon polarization either in terms of horizontal/vertical, or left/right circular. I know, I'm not helping. Let's just skip this part for now. I might get back to it later.

So what happens when you measure a quantum system, and your measurement device is not "perfectly aligned" (basis-wise) with the quantum system? As it in fact almost never will be, by the way, unless you use a classical device to measure a classical system. Because in classical physics, we are all in the same basis automatically.  (OK, I see that I'll have to clarify this to you but trust me here.)

Look forward to Part 4 instead. Where I will finally delve into "Stage 2" of the measurement process. That is the one that baffled von Neumann, because he could not understand where exactly the wavefunction collapses. And in hindsight, there was no way he could have figured this out, because the wavefunction never collapses. Ever. What I'll show you in Part 4 is how a measurement device can be perfectly (by which I mean intrinsically) consistent, yet tell you a story about what the quantum state is and lie to you at the same time. Lie to you, through its proverbial teeth, if it had any.  

But come on, cut the measurement device some slack. It is lying to you because it has no choice. You ask it to make a copy of the quantum state, and it really is not allowed to do so. What will happen (as I will show you), is that it will respond by displaying to you a random value, with a probability given by the square of some part of the amplitude of the quantum wavefunction. In other words, I'll show you how Born's rule comes about, quite naturally. In a world where no wavefunction collapses, of course.

Part 4 is here

Monday, July 14, 2014

On quantum measurement (Part 2: Some history, and John von Neumann is confused)

This is Part 2 of the "On quantum measurement" series. Part 1: (Hans Bethe, the oracle) is here.

Before we begin in earnest, I should warn you, (or ease your mind, whichever is your preference): this sequence has math in it. I'm not in it to dazzle you with math. It's just that I know no other way to convey my thoughts about quantum measurement in a more succinct manner. Math, you see, is a way for those of us who are not quite bright enough, to hold on to thoughts which, without math, would be too daunting to formulate, too ambitious to pursue. Math is for poor thinkers, such as myself. If you are one of those too, come join me. The rest of you: why are you still reading? Oh, you're not. OK. 

Hey, come back: this historical interlude turns out to be math-free after all. But I promise math in Part 3.

Before I offer to you my take on the issue of quantum measurement, we should spend some time reminiscing, about the history of the quantum measurement "problem". If you've read my posts (and why else would you read this one?), you'll know one thing about me: when the literature says there is a "problem", I get interested. 

This particular problem isn't that old. It arose through a discussion between Niels Bohr and Albert Einstein, who disagreed vehemently about measurement, and the nature of reality itself.  

Bohr and Einstein at Ehrenfest's house, in 1925. Source: Wikimedia

The "war" between Bohr and Eintein only broke out in 1935 (via dueling papers in the Physical Review), but the discussion had been brewing for 10 years at least. 

Much has been written about the controversy (and a good summary albeit with a philosophical bent can be found in the Stanford Encyclopedia of Philosophy). Instead of going into that much detail, I'll just simplify it by saying:

Bohr believed the result of a measurement reflects a real objective quantity (the value of the property being measured).

Einstein believed that quantum systems have objective properties independent of their measurements, and that becuase quantum mechanics cannot properly describe them, the theory must necessarily be incomplete.

In my view, both views are wrong. Bohr's because his argument relies on a quantum wavefunction that collapses upon measurement (which as I'll show you is nonsense), and Einstein's because the idea that a quantum system has objective properties (described by one of the eigenstates of a measurement device) is wrong and that, as a consequence the notion that quantum mechanics must be incomplete is wrong as well. He was right, though, about the fact that quantum systems have properties independently of whether you measure them or not. It is just that we may not ever know what these properties are.

But enough of the preliminaries. I will begin to couch quantum measurement in terms of a formalism due to John von Neumann. If you think I'm obsessed by the guy because he seems to make an appearance in every second blog post of mine: don't blame me. He just ended up doing some very fundamental things in a number of different areas. So I'm sparing you the obligatory picture of his, because I assume you have seen his likeness enough. 

John von Neumann's seminal book on quantum mechanics is called "Mathematische Grundlagen der Quantenmechanik" (Mathematical foundations of quantum theory), and appeared in 1932, three years before the testy exchange of papers (1) between Bohr and Einstein. 

My copy of the "Grundlagen". This is the version issued by the U.S. Alien Property Custodian from 1943 by Dover Publications. It is the verbatim German book, issued in the US in war time. The original copyright is by J. Springer, 1932.

In this book, von Neumann made a model of the measurement process that had two stages, aptly called "first stage" and "second stage". [I want to note here that JvN actually called the first stage "Process 2" and the second stage "Process 1", which today would be confusing so I reversed it.]

The first stage is unitary, which means "probability conserving". JvN uses the word "causal" for this kind of dynamics. In today's language, we call that process an "entanglement operation" (I'll describe it in more details momentarily, which means "wait for Part 3"). Probability conservation is certainly a requisite for a causal process, and I actually like JvN's use of the word "causal". That word now seems to have acquired a somewhat different meaning

The second stage is the mysterious one. It is (according to JvN) acausal, because it involves the collapse of the wavefunction (or as Hans Bethe called it, the "reduction of the wavepacket"). It is clear that this stage is mysterious to Johnny, because he doesn't know where the collapse occurs. He is following "type one" processes in a typical measurement (in the book, he measures temperature as an example) from the thermal expansion of the mercury fluid column, to the light quanta that scatter off the mercury column and enter our eye, where the light is refracted in the lense and forms an image on the retina, which then stimulate nerves in the visual cortex, and ultimately creates the "subjective experience" of the measurement. 

According to JvN, the bounday between what is the quantum system and what is the measurement device can be moved in an arbitrary fashion. He understands perfectly that a division into a system to be measured and a measuring system is necessary and crucial (and we'll spend considerable time discussing this), but the undeniable fact—that it is not at all clear where to draw the boundary— is a mystery to him. He invokes the philosophical principle of "psychophysical parallelism"—which states that there can be no causal interaction between the mind and the body— to explain why the boundary is so fluid. But it is the sentence just following this assertion that puts the finger on what is puzzling him. He writes: 

"Because experience only ever makes statements like this: 'an observer has had a (subjective) perception', but never one like this: 'a physical quantity has taken on a particular value'."(2)

This is, excuse my referee's voice, very muddled. He says: We never have the experience "X takes on x", we always experience "X looks like it is in state x". But mathematically they should be the same. He makes a distinction that does not exist. We will see later why he feels he must make that distinction. But, in short, it is because he thinks that what we perceive must also be reality. If a physical object X is perceived to take on state x, then this must mean that objectively "X takes on x". In other words, he assumes that subjective experience must mirror objective fact.

Yet, this is provably dead wrong. 

That is what Nicolas and I discovered in the article in question, and that is undoubtedly what Hans Bethe immediately realized, but struggled to put into words. 

Quantum reality, in other words, is a whole different thing than classical reality. In fact, in the "worst case" (to be made precise as we go along) they may have nothing to do with each other, as Nicolas and I  argue in a completely obscure (that is unknown) article entitled "What Information Theory Can Tell us About Quantum Reality" (3).

What you will discover when following this series of posts, is that if your measurement device claims "the quantum spin that you were measuring was in state up", then this may not actually tell you anything about the true quantum state. The way I put it colloquially is that "measurement devices tend to lie to you". They lie, because they give you an answer that is provably nonsense. 

In their (the device's) defense, they have no choice but to lie to you (I will make that statement precise when we do math). They lie because they are incapable of telling the truth. Because the truth is, in a precise information-theoretic way that I'll let you in on, bigger than they are. 

JvN tried to reconcile subjective experience with objective truth. Subjectively, the quantum state collapsed from a myriad of possibilities to a single truth. But in fact, nothing of the sort happens. Your subjective experience is not reflecting an objective truth. The truth is out there, but it won't show itselves in our apparatus. The beauty of theoretical physics is that we can find out about how the wool is being pulled over our eyes—how classical measurement devices are conspiring to deceive us—when our senses would never allow us a glimpse of the underlying truth.

Math supporting all that talk will start in Part 3. 

(1) Einstein (with Podolsky and Rosen) wrote a paper entitled "Can Quantum-Mechanical Description of Physical Reality Be Considered Complete?". It appeared in Phys. Rev. 47 (1935) 777-780. Four pages: nowadays it would be a PRL. I highly recommend reading it. Bohr was (according to historical records and the narrative in Zurek's great book about it all) incensed. Bohr reacted by writing a paper with the same exact title as Einstein's, that has (in my opinion) nothing in it. It is an astonishing paper because it is content-free, but was meant to serve as a statement that Bohr refutes Einstein, when in fact Bohr had nothing. 

(2) Denn die Erfahrung macht nur Aussagen von diesem Typus: ein Beobachter hat eine bestimmte (subjektive) Wahrnehmung gemacht, und nie eine solche: eine physikalische Größe hat einen bestimmten Wert. 

(3) C. Adami & N.J. Cerf, Lect. Notes in Comp. Sci. 1509 (1999) 258-268

Part 3 (No cloning allowed) continues here

Sunday, June 22, 2014

On quantum measurement (Part 1: Hans Bethe, the oracle)

For this series of posts, I'm going to take you on a ride through the bewildering jungle that is quantum measurement. I've no idea how many parts will be enough, but I'm fairly sure there will be more than one. After all, the quantum mechanics of measurement has been that subject's "mystery of mysteries" for ages, it now seems. 

Before we begin, I should tell you how I became interested in the quantum measurement problem. Because for the longest time I wasn't. During graduate school (at the University of Bonn), the usual thing happened: the Prof (in my case Prof. Werner Sandhas, who I hope turned eighty this past April 14th) says that they'll tell us about quantum measurement towards the end of the semester, and never actually get there. I have developed a sneaking suspicion that this happened a lot, in quantum mechanics classes everywhere, every time. Which would explain a lot of the confusion that still reigns. 

However, to tell you how I became interested in this problem is a little difficult, because I risk embarrassing myself. The embarrassment that I'm risking is not the usual type. It is because the story that I will tell you will seem utterly ridiculous, outrageously presumptuous, and altogether improbable. But it occurred just as I will attempt to tell it. There is one witness to this story, my collaborator in this particular endeavor, the Belgian theoretical physicist Nicolas Cerf.  

Now, because Nicolas and I worked together very closely on a number of different topics in quantum information theory when we shared an office at Caltech, you might surmise that he would corroborate any story I write (and thus not be an independent witness). I'm sure he remembers the story (wait for it, I know I'm teasing) differently, but you would have to ask him. All I can say is that this is how I remember it.

Nicolas and I had begun to work in quantum information theory around 1995-1996. After a while we were studying the quantum communication protocols of quantum teleportation and quantum superdense coding, and in our minds (that is, our manner of counting), information did not add up. But, we thought, information must be conserved. We were certain. (Obviously that has been an obsession of mine for a while, those of you who have read my black hole stories will think to yourselves).
Space-time diagrams for the quantum teleportation process (a) and superdense coding process (b). EPR stand for an entangled Einstein-Podolsky-Rosen pair. Note the information values for the various classical and quantum bits in red. Adapted from Ref. [1]. The letters 'M' and 'U' stand for a measurement and a unitary opration, respectively. A and B are the comunication partners 'Alice' and 'Bob'.

But information cannot be conserved, we realized, unless you can have negative bits. Negative entropy: anti-qubits (see the illustration above). This discovery of ours is by now fairly well-known (so well-known, in fact, that sometimes articles about negative quantum entropy don't seem to feel it necessary to refer to our original paper at all). But it is only the beginning of the story (ludicrous as it may well appear to you) that I want to tell. 

After Nicolas and I wrote the negative entropy paper, we realized that quantum measurement was, after all, reversible. That fact was obvious once you understood these quantum communication protocols, but it was even more obvious once you understood the quantum erasure experiment. Well, for all we knew, this was flying in the face of accepted lore, which (ever since Niels Bohr) would maintain that quantum measurement required an irreversible collapse of the quantum wavefunction. Ordinarily, I would now put up a picture of the Danish physicist who championed wave function collapse, but I cannot bring myself to do it: I have come to loathe the man. I'm sure I'm being petty here.

With this breakthrough discovery in mind ("Quantum measurement is reversible!") Nicolas and I went to see Hans Bethe, who was visiting Caltech at the time. At this point, Hans and I had become good friends, as he visited Caltech regularly. I wrote up my recollections of my first three weeks with him (and also our last meeting) in the volume commemorating his life. (If you don't want to buy that book but read the story, try this link. But you should really buy the book: there's other fun stuff in it). The picture below is from Wikipedia, but that is not how I remember him. I first met him when he was 85. 
         Hans A. Bethe (1906-2005) (Source: Wikimedia

Alright, enough of the preliminaries. Nicolas Cerf and I decided to ask for Hans's advice, and enter his office, then on the 3rd floor of Caltech's Kellogg Radiation Laboratory. For us, that meant one flight of stairs up. We tell him right away that we think we have discovered something important that is relevant to the physics of quantum measurement, and start explaining our theory. I should tell you that what we have at this point isn't much of a theory: it is the argument, based on negative conditional quantum entropies, that quantum measurement can in principle be reversed. 

Hans listens patiently. Once in a while he asks a question that forces us to be more specific.

After we are done, he speaks.

"I am not that much interested in finding that quantum measurement is reversible. What I find much more interesting is that you have solved the quantum measurement problem."

After that, there is a moment of silence. Both Nicolas and I are utterly stunned. 

I am first to ask the obvious. 
"Can you explain to us why?"

You see, it is fairly improbable that a physicist of the caliber of Hans Bethe tells you that you have solved the "mystery of mysteries". Neither Nicolas nor I had seen this coming from a mile away. And we certainly had no idea why he just said that.

We were waiting with--shall we say--bated breath. Put yourself into our position. How would you have reacted? What came after was also wholly unexpected.

After I asked him to explain that last statement, he was silent for--I don't know--maybe three seconds. In a conversation like this, that is bordering on a perceived eternity.

My recollection is fuzzy at this point. Either he began by saying "I can't explain it to you", or he immediately told the story of the Mathematics Professor who lectures on a complex topic and fills blackboard after blackboard, until a student interrupts him and asks: "Can you explain this last step in your derivation to me?"

The Professor answers: "It is obvious". The student insists. "If it is obvious, can you explain it?", and the Professor answers: "It is obvious, but I'll have to get back to you to explain it tomorrow".

At this point of Hans telling this story, the atmosphere is a little awkward. Hans tell us that it is obvious that we solved the quantum measurement problem, but he can't tell us exactly why he thinks it is obvious that we did. It certainly is not obvious to us.

I know Hans well enough at this point that I press on. I cannot let that statement go just like that. He did go on to try to explain what he meant.  Now of course I wish I had taken notes but I didn't. But what he said resonated in my mind for a long time (and I suspect that this is true for Nicolas as well). After what he said, we both dropped everything we were doing, and worked only on the quantum measurement problem, for six months, culminating in this paper

What he said was something like this: "When you make a measurement, its outcome is conditional on the measurements made on that quantum system before that, and so on, giving rise to a long series of measurements, all conditional on each other".

This is nowhere near an exact rendition of what he said. All I remember is him talking about atomic decay, and measuring the product of the decay and that this is conditional on previous events, and (that is the key thing I remember) that this gives rise to these long arcs of successive measurements whose outcomes are conditional on the past, and condition the future. 

Both Nicolas and I kept trying to revive that conversation in our memory when we worked on the problem for the six months following. (Hans left Caltech that year the day after our conversation). Hans also commented that our finding had deep implications for quantum statistical mechanics, because it showed that the theory is quite different from the classical theory after all. We did some work on the quantum Maxwell Demon in reaction to that, but never really had enough time to finish it. Other people after us did. But for the six months that followed, Nicoals and I worked with only this thought in our mind:

"He said we solved the problem. Let us find out how!"

In the posts that follow this one, I will try to give you an idea of what it is we did discover (most of it contained in the article mentioned above). You will easily out find that this article isn't published (and I'll happily tell you the story how that happened some other time). While a good part of what's in that paper did get published ultimately, I think the main story is still untold. And I am attempting to tell this story still, via a preprint I have about consecutive measurements, that I'm also still working on. But consecutive measurement is what Hans was telling us about in this brief session, that changed the scientific life of both Nicolas and I. He knew what he was talking about, but he didn't know how to tell us just then. It was obvious to him. I hope it will be obvious to me one day too.

Even though the conversation with Hans happened as I described, I should tell you that 18 years after Hans said this to us (and thinking about it and working on it for quite a while) I don't think he was altogether right. We had solved something, but I don't think we solved "the whole thing". There is more to it. Perhaps much more.

Stay tuned for Part 2, where I will explain the very basics of quantum measurement, what von Neumann had to say about it, as well as what this has to do with Everett and the "Many-world" interpretation. And if this all works out as I plan, perhaps I will ultimately get to the point that Hans Bethe certainly did not foresee: that the physics of quantum measurement is intimately linked to Gödel incompleteness. But I'm getting ahead of myself.

[1] N.J. Cerf and C. Adami. Negative entropy and information in quantum mechanics. Phys. Rev. Lett. 79 (1997) 5194-5197.

Note added: upon reading the manuscript again after all this time, I found in the acknowledgements the (I suppose more or less exact) statement that Hans had made. He stated that "negative entropy solves the problem of the reduction of the wave packet". Thus, it appears he did not maintain that we had "solved the measurement problem" as I had written above, only a piece if it.

Part 2 (Some history, and John von Neumann is confused) continues here.

Sunday, June 8, 2014

Whose entropy is it anyway? (Part 2: The so-called Second Law)

This is the second part of the "Whose entropy is it anyway?" series. Part 1: "Boltzmann, Shannon, and Gibbs" is here.

Yes, let's talk about that second law in light of the fact we just established, namely that Boltzmann and Shannon entropy are fundamentally describing the same thing: they are measures of uncertainty applied to different realms of inquiry, making us thankful that Johnny vN was smart enough to see this right away. 

The second law is usually written like this: 

"When an isolated system approaches equilibrium from a non-equilibrium state, its entropy almost always increases"

I want to point out here that this is a very curious law, because there is, in fact, no proof for it. Really, there isn't. Not every thermodynamics textbook is honest enough to point this out, but I have been taught this early on, because I learned Thermodynamics from the East-German edition of Landau and Lifshitz's tome "Statistische Physik", which is quite forthcoming about this (in the English translation):

"At the present time, it is not certain whether the law of increase of entropy thus formulated can be derived from classical mechanics"

From that, L&L go on to speculate that the arrow of time may be a consequence of quantum mechanics.

I personally think that quantum mechanics has nothing to do with it (but see further below). The reason the law cannot be derived is because it does not exist. 

I know, I know. Deafening silence. Then:

"What do you mean? Obviously the law exists!"

What I mean, to be more precise, is that strictly speaking Boltzmann's entropy cannot describe what goes on when a system not at equilibrium approaches said equilibrium, because Boltzmann's entropy is an equilibrium concept. It describes the value that is approached when a system equilibrates. It cannot describe its value as it approaches that constant. Yes, Boltzmann's entropy is a constant: it counts how many microstates can be taken on by a system at fixed energy. 

When a system is not at equlibrium, fewer microstates are actually occupied by the system, but the number it could potentially take on is constant. Take, for example, the standard "perfume bottle" experiment that is so often used to illustrate the second law:
An open "perfume bottle" (left) about to release its molecules into the available space (right)

The entropy of the gas inside the bottle is usually described as being small, while the entropy of the gas on the right (because it occupies a large space) is believed to be large. But Boltzmann's formula is actually not applicable to the situation on the left, because it assumes (on account of the equilibrium condition), that the probability distributions in phase space of all particles involved are independent. But they are clearly not, because if I know the location of one of the particles in the bottle, I can make very good predictions about the other particles because they occupy such a confined space. (This is much less true for the particles in the larger space at right, obviously).

What should we do to correct this? 

We need to come up with a formula for entropy that is not explicitly true only at equilibrium, and that allows us to quantify correlations between particles. Thermodynamics cannot do this, because equilibrium thermodynamics is precisely that theory that deals with systems whose correlations have decayed long ago, or as Feynman put it, systems "where all the fast things have happened but the slow things have not". 

Shannon's formula, it turns out, does precisely what we are looking for: quantify correlations between all particles involved. Thus, Shannon's entropy describes, in a sense, nonequilibrium thermodynamics. Let me show you how.

Let's go back to Shannon's formula applied to a single molecule, described by a random variable $A_1$, and call this entropy $H(A_1)$. 

I want to point out right away something that may shock and disorient you, unless you followed the discussion in the post "What is Information? (Part 3: Everything is conditional)" that I mentioned earler. This entropy $H(A_1)$ is actually conditional. This will become important later, so just store this away for the moment. 

OK. Now let's look at a two-atom gas. Our second atom is described by random variable $A_2$, and you can see that we are assuming here that the atoms are distinguishable. I do this only for convenience, everything can be done just as well for indistinguishable particles.

If there are no correlations between the two atoms, then the entropy of the joint system $H(A_1A_2)=H(A_1)+H(A_2)$, that is, entropy is extensive. Thermodynamical entropy is extensive because it describes things at equilibrium. Shannon entropy, on the other hand is not. It can describe things that are not at equilibrium, because then
$$H(A_1A_2)=H(A_1)+H(A_2)-H(A_1:A_2) ,$$
where $H(A_1:A_2)$ is the correlation entropy, or shared entropy, or information, between $A_1$ and $A_2$. It is what allows you to predict something about $A_2$ when you know $A_1$, which is precisely what we already knew we could do in the picture of the molecules crammed into the perfume bottle on the left. This is stunning news for people who only know thermodynamics,

What if we have more particles? Well, we can quantify those correlations too. Say we have three variables, and the third one is (with very little surprise) described by variable $A_3$. It is then a simple exercise to write the joint entropy $H(A_1A_2A_3)$ as
Entropy Venn diagram for three random variables, with the correlation entropries indicated.

We find thus that the entropy of the joint system of variables can be written in terms of the extensive entropy (the sum of the subsystem entropies) minus the correlation entropy $H_{\rm corr}$, which inlcudes correlations between pairs of variables, triplets of variables, and so forth. Indeed, the joint entropy of an $n$-particle system can be written in terms of a sum that features the (extensive) sum of single-particle entropies plus (or minus) the possible many-particle correlation entropies (the sign always alternates between even and odd number of participating particles):
$$H(A_1,...,A_n)=\sum_{i=1}^n H(A_i)-\sum_{i\neq j}H(A_i:A_j)+\sum_{i\neq j\neq k} H(A_i:A_j:A_k)-\cdots. $$
This formula quickly becomes cumbersome, which is why Shannon entropy isn't a very useful formulation of non-equilibrium thermodynamics unless the correlations are somehow confined to just a few variables. 

Now, let's look at what happens when the gas in the bottle escapes into the larger area. Initially, the entropy is small, because the correlation entropy is large. Let's write this entropy as 
where $I$ is the information I have because I know that the molecules are in the bottle. You now see why the entropy is small: you know a lot (in fact, $I$) about the system. The unconditional piece is the entropy of the system when all the fast things (the molecules escaping the bottle) have happened.  

Some of you may have already understood what happens when the bottle is opened: the information $I$ that I have (or any other observer, for that matter, has) decreases. And as a consequence, the conditional entropy $H(A_1,...,A_n|I)$ increases. It does so until $I=0$, and the maximum entropy state is achieved. Thus, what is usually written as the second law is really just the increase of the conditional entropy as information becomes outdated. Information, after all, is that which allows me to make predictions with accuracy better than chance. If the symbols that I have in my hand (and that I use to make the predictions) do not predict anymore, then they are not information anymore: they have turned to entropy. Indeed, in the end this is all the second law is about: how information turns into entropy.

You have probably already noticed that I could now take the vessel on the right of the figure above and open that one up. Then you realize that you did have information after all, namely you knew that the particles were confined to the larger area. This example teaches us that, as I pointed out in "What is Information? (Part I)", the entropy of a system is not a well-defined quantity unless we specify what measurement device we are going to use to measure it with, and as a consequence what the range of values of the measurements are going to be. 

The original second law, being faulty, should therefore be reformulated like this: 

In a thermodynamical equilibrium or non-equilibrium process, the unconditional (joint) entropy of a closed system remains a constant. 

The "true second law", I propose, should read:

When an isolated system approaches equilibrium from a non-equilibrium state, its conditional entropy almost always increases

Well, that looks suspiciously like the old law, except with the word "conditional" in front of "entropy". It seems like an innocuous change, but it took two blog posts to get there, and I hope I have convinced you that this change is not at all trivial. 

Now to close this part, let's return to Gibbs's entropy, which really looks exactly like Shannon's. And indeed, the $p_i$ in Gibbs's formula 
$$S=-\sum_i p_i\log p_i$$
could just as well refer to non-equilibrium distributions. If it does refer to equilibrium, we should use the Boltzmann distribution (I set here Boltzmann's constant to $k=1$, as it really just renormalizes the entropy)
$$p_i=\frac1Z e^{-E_i/T}$$
where $Z=\sum_ie^{-E_i/T}$ is known as the "partition function" in thermodynamics (which just makes sure that the $p_i$ are correctly normalized), and $E_i$ is the energy of the $i$th microstate. Oh yeah, T is the temperature, in case you were wondering.

If we plug this $p_i$ into Gibbs's (or Shannon's) formula, we get 
$$S=\log Z+E/T$$
This is, of course, a well-known thermodynamical relationship because $F=-T\log Z$ is also known as the Helmholtz free energy, so that $F=E-TS$. 

As we have just seen that this classical formula is the limiting case of using the Boltzmann (equilibrium) distribution within Gibbs's (or Shannon's) formula, we can be pretty confident that the relationship between information theory and thermodynamics I just described is sound. 

As a last thought: how did von Neumann know that Shannon's formula was the (non-equilibrium) entropy of thermodynamics? He had been working on quantum statistical mechanics in 1927, and deduced that the quantum entropy should be written in terms of the quantum density matrix $\rho$ as (here "Tr" stands for the matrix trace)
$$S(\rho)=-{\rm Tr} \rho\log \rho.$$
Quantum mechanical density matrices are in general non-diagonal. But were they to become classical, they would approach a diagonal matrix where all the elements on the diagonal are probabilities $p_1,...,p_n$. In that case, we just find
$$S(\rho)\to-\sum_{i=1}^n p_i\log p_i, $$ 
in other words, Shannon's formula is just the classical limit of the quantum entropy that was invented twentyone years before Shannon thought of it, and you can bet that Johnny immediately saw this!

In other words, there is a very good reason why Boltzmann's, Gibbs's, and Shannon's formulas are all called entropy, and Johnny von Neumann didn't make this suggestion to Shannon in jest.

Is this the end of "Whose entropy is it anyway?". Perhaps, but I have a lot more to write about the quantum notion of entropy, and whether considering quantum mechanical measurements can say anything about the arrow of time (as Landau and Lifshitz suggested). Because considering the quantum entropy of the universe can also say something about the evolution of our universe and the nature of the "Big Bang", perhaps a Part 3 will be appropriate. 

Stay tuned!