Monday, August 4, 2014

On quantum measurement (Part 4: Born's rule)

Let me briefly recap parts 1-3 for those of you who like to jump into the middle of a series, convinced that they'll get the hang of it anyway. You might, but a recap is nice anyway.

Remember these posts use MathJax to render equations. Your browser can handle this, so if you see a bunch of dollar signs and LaTeX commands instead of formulas, you need to configure your browser to handle MathJax.

In Part 1 I really only reminisced about how I got interested in the quantum measurement problem, by way of discovering that quantum (conditional) entropy can be negative, and by the oracular announcement of the physicist Hans Bethe that negative entropy solves the problem of wavefunction collapse (in the sense that there isn't any). 

In Part 2 I told you a little bit about the history of the measurement problem, the roles of Einstein and Bohr, and that our hero John von Neumann had some of the more penetrating insights into quantum measurement, only to come up confused. 

In Part 3 I finally get into the mathematics of it all, and outline the mechanics of a simple classical measurement, as well as a simple quantum measurement. And then I go on to show you that quantum measurement isn't at all like its classical counterpart. In the sense that it doesn't make a measurement at all. It can't because it is procedurally forbidden to do so by the almighty no-cloning theorem. 

Recall that in a classical measurement, you want to transfer the value of the observable of your interest on to the measurement device, which is manufactured in such a way that it makes "reading off" values easy. You never really read the value of the observable off of the thing itself: you read it off of the measurement device, fully convinced that your measurement operation was designed in such a manner that the two (system and measurement device) are perfectly correlated, so reading the value off of one will reveal to you the value of the original. And that does happen in good classical measurements. 

And then I showed you that this cannot happen in a quantum measurement, unless the basis chosen for the measurement device happens to coincide exactly with the basis of the quantum system (they are said to be "orthogonal"). Because then, it turns out, you can actually perform perfect quantum cloning.

The sounds of heads being scratched worldwide, exactly when I wrote the above, reminds me to remind you that the no-cloning theorem only forbids the cloning of an arbitrary unknown state. "Arbitrary" here means "given in any basis, that furthermore I'm not familiar with". You can clone specific states. Like, for example, quantum states that you have prepared in a particular basis that is known to you, like the one you're going to measure it in, for example. The way I like to put it is this: Once you have measured an unknown state, you have rendered it classical. After that, you can copy it to your heart's content, as there is no law against classical copying. Well, no physical law. 

Of course, none of this is probably satisfying to you, because I have not revealed to you what a quantum measurement really does. Fair enough. Let's get cooking.

Here's the thing:

When you measure a quantum system, you're not really looking at the quantum system, you're looking at the measurement device.

"Duh!", I can hear the learned audience gasp, "you just told us that already!" 

Yes I did, but I told you that in the context of a classical measurement. In the context of a quantum measurement, the same exact triviality becomes a whole lot less trivial. A whole whole lot less. So let's do this slowly.

Your measurement device is classical. This much we have to stipulate, because in the end, our eyes and nervous system are ultimately extensions of the measurement device just as JvN had surmised they would. But even though they are classical, they are made out of quantum components. That's the little tidbit that completely escaped our less learned friend Niels Bohr, who wanted to construct a theory in which quantum and classical systems both had their own epistemological status. I shudder to think how one can even conceive of such a blunderous idea.

But being classical really just means that we know which basis to measure the thing in, remember. It is not a special state of matter. 

Oh, what is that you say? You say that being classical is really something quite different, according to the textbooks? Something about $\hbar\to0$?

Forget that, old friend, that's just the kind of mumbo jumbo that the old folks of yesteryear are trying to (mis)teach you. Classicality is given entirely in terms of the relative state of systems and devices. Oh, it just so happens that a classical system, because it has so many entangled particles, must be described in terms of a basis that is so high-dimensional that it will appear orthogonal to any other high-dimensional system (simply because almost all vectors in a high-dimensional space are orthogonal). That's where classicality comes from. Yes, many particles are necessary to make something classical, but it does not have to be classical. It is just statistically so. I don't recall having read this argument anywhere, and I did once think about publishing it. But it is really trivial, which means there is no way I could ever get it published anyway. Because I will be called crazy by the reviewers.

Excuse that little tangent, I just had to get that off of my chest. So, back to the basics: our measurement device is classical, but it is really just a bunch of entangled quantum particles. 

There is something peculiar about the quantum particles that make up the classical system: they are all correlated. Classically correlated. What that means is that if one of the particles has a particular property or state, its neighbor does so too. They kind of have to: they are one consistent bunch of particles that are masquerading as a classical system. What I mean is that, if the macroscopic measurement device's "needle" points to "zero", then in a sense every particle within that device is in agreement. It's not like half are pointing to 'zero', a quarter to 'one', and another quarter to '7 trillion'. They are all one happy correlated family of particles, in complete agreement. And when they change state, they all do so at the same time. 

How is such a thing possible, you ask? 

Watch. It's really quite thrilling so see how this works.

Let us go back to our lonely quantum state $|x\rangle$, whose position we were measuring. Only now I will, for the sake of simplicity, measure the state of a quantum discrete variable, a qubit. The qubit is a "quantum bit", and you can think of it as a "spin-1/2" particle. Remember, the thing that can only have the state "up" and down", only they can also take on superpositions of these states? If this was a textbook then now I would hurl the Bloch sphere at you, but this is a blog so I won't. 

I'll write the basis states of the qubit as $|0\rangle$ and $|1\rangle$. I could also (and more convincingly), have written $|\uparrow\rangle$ and $|\downarrow\rangle$, but that would have required much more tedious writing in LaTeX. An arbitrary quantum state $|Q\rangle$ can then be written as
Here, $\alpha$ and $\beta$ are complex numbers that satisfy $|\alpha|^2+|\beta^2|=1$, so that the quantum state is correctly normalized. But you already knew all that. Most of the time, we'll restrict ourselves to real, rather than complex, coefficients. 

Now let's bring this quantum state in touch with the measurement device. But let's do this one bit at the time. Because the device is really a quantum system that thinks it is classical. Because, as I like to say, there is really no such thing as classical physics. 

So let us treat it as a whole bunch of quantum particles, each a qubit. I'm going to call my measurement device the "ancilla" $A$. The word "ancilla" is latin for "maid", and because the ancilla state is really helping us to do our (attempted) measurement, it is perfectly named. Let's call this ancilla state $|A_1\rangle$, where the "one" is to remind you that it is really only one out of many. An attempted quantum measurement is, as I outlined in the previous post (and as John von Neumann correctly figured out) an entanglement operation. The ancilla starts out in the state $|A_1\rangle=|0\rangle$. We discussed previously that this is not a limitation at all. Measurement does this:
I can tell you exactly which unitary operator makes this transformation possible, but then I would lose about 3/4 of my readership. Just trust me that I know. And keep in mind that the first ket vector refers to the quantum state, and the second to ancilla $A_1$. I could write the whole state like this to remind you:
but this would get tedious quickly. All right, fine, I'll do it. It really helps in order to keep track of things. 

To continue, let's remember that the ancilla is really made out of many particles. Let's first look at a second one. You know, I need at least a second one, otherwise I can't talk about the consistency of the measurement device, which needs to be such that all the elements of the device agree with each other. So there is an ancilla state $|A_2\rangle=|0\rangle_2$. At least it starts out in this state. And when the measurement is done, you find that
$$ |Q\rangle|A_1\rangle|A_2\rangle\to\alpha|0\rangle_Q|0\rangle_1|0\rangle_2+\beta|1\rangle_Q|1\rangle_1|1\rangle_2.$$
There are several ways of showing that this is true for a composite measurement device $|A_1\rangle|A_2\rangle$. But as I will show you much later (when we talk about Schrödinger's cat), the pieces of the measurement device don't actually have to measure the state at the same time. They could do so one after the other, with the same result!

Oh yes, we will talk about Schrödinger's cat (but not in this post), and my goal is that after we're done you will never be confused by that cat again. Instead, you should go and confuse cats, in retaliation. 

Now I could introduce $n$ of those ancillary systems (and I have in the paper), but for our purposes here two is quite enough, because I can study the correlation between two systems already. So let's do that. 

We do this by looking at the measurement device, as I told you. In quantum mechanics, looking at the measurement device has a very precise meaning, in that you are not looking at the quantum system. And not looking at the quantum system means, mathematically, to trace over its states. I'll show you how to do that.

First, we must write down the density matrix that corresponds to the joint system $|QA_1A_2\rangle$ (that's my abbreviation for the long state after measurement written above). I write this as 
$$\rho_{QA_1A_2}=|QA_1A_2\rangle\langle QA_1A_2|$$
We can trace out the quantum system $Q$ by the simple operation
$$\rho_{A_1A_2}={\rm Tr}_Q (\rho_{QA_1A_2}).$$
Most of you know exactly what I mean by doing this "partial trace", but those of you who do not, consult a good book (like Asher Peres' classic and elegant book), or (gasp!) consult the Wiki page

So making a quantum measurement means disregarding the quantum state altogether. We are looking at the measurement device, not the quantum state. So what do we get?

We get this:
If you have $n$ ancilla, just add that many zeros inside the brackets in the first term, and that many ones in the brackets in the second term. You see, the measurement device is perfectly consistent: you either have all zeros (as in $|00....000\rangle_{12....n}\langle00....000|$) or all ones. And note that you can add your eye, and your nervous system, and what not in the ancilla state. It doesn't matter: they will all agree. No need for psychophysical parallelism, the thing that JvN had to invoke. 

I can also illustrate the partial trace quantum information-theoretically, if you prefer. Below on the left is the quantum Venn diagram after entanglement. "S" refers to the apparent entropy of the measurement device, and it is really just the Shannon entropy of the probabilities $|\alpha|^2$ and $|\beta|^2$. But note that there are minus signs everywhere, telling you that this systems is decidely quantum. When you trace out the quantum system, you simply "forget that it's there", which means you erase the line that crosses the $A_1A_2$ system, and add all the stuff up that you find. And what you get is the Venn diagram to the right, which your keen eye will identify as the Venn diagram of a classically correlated state.
Venn diagram of the full quantum system plus measurement device (left), and only of the measurement device (not looking at the quantum system (right).
What all this means is that the resulting density matrix is a probabilistic mixture, showing you the classical result "0" with probability $|\alpha|^2$, and the result "1" with probability $|\beta|^2$. 

And that, ladies and gentlemen, is just Born's rule: that the probability of quantum measurement is given by the square of the amplitude of the quantum system. Derived for you in just a few lines, with hardly any mathematics at all. And because every blog post should have a picture (and this only had a diagram), I regale you with the one of Max Born:
Max Born (1882-1970) Source: Wikimedia
A piece of trivia you may not know: Max got his own rule wrong in the paper that announced it (see Ref. [1]). He crossed it out in proof and replaced the rule (which has the probability given by the amplitude, not the square of the amplitude) by the correct one in a footnote. Saved by the bell!

Of course, having derived Born's rule isn't magical. But the way I did it tells us something fundamental about the relationship between physical and quantum reality. Have you noticed the big fat "zero" in the center of the Venn diagram on the upper left? It will always be there, and that means something fundamental. (Yes, that's a teaser). Note also, in passing, that there was no collapse anywhere. After measurement, the wavefunction is still given by $|QA_1\cdots A_n\rangle$, you just don't know it.

In Part 5, I will delve into the interpretation of what you just witnessed. I don't know yet whether I will make it all the way to Schrödinger's hapless feline, but here's hoping.

[1] M. Born. Zur Quantenmechanik the Stoβvorgänge. Zeitschrift für Physik 37 (1926) 863-867.

Thursday, July 24, 2014

On quantum measurement (Part 3: No cloning allowed)

In the previous two parts, I told you how I became interested in the quantum measurement problem (Part I), and provided a bit of historical background (Part 2). Now we'll get to the heart of the matter. 

Note that I'm using MathJax to display equations in this blog. If your browser shows a bunch of dollar signs and gibberish where equations should appear, you probably have to figure out how to install MathJax on your browser. Don't email me: I know nothing about such intricacies.

Let me remind you that our hero John von Neumann described quantum measurement as a two-stage process. (No, I'm not showing his likeness.) The first stage is now commonly described as entanglement. This is what we'll discuss here. I'll get to the second process (the one where the wavefunction ostensibly collapses, except Hans Bethe told me that it doesn't) in Part 4. 

For the purpose of illustration, I'm going to describe the measurement of a position, but everything can be done just as well for discrete degrees of freedom, such as, you know, spins. In fact, I'll show you a bunch of spin measurements waaay later, like the Stern-Gerlach experiment, or the quantum eraser. But I'm getting ahead of myself.

Say our quantum system that we would like to measure is in state $|Q\rangle=|x\rangle$. I'm going to use Q to stand in for quantum systems a lot. Measurement devices will be called "M", or sometimes "A" or "B". 

All right. How do you measure stuff to begin with?

In classical physics, we might imagine that a system is characterized by the position variable $x$, [I'll write this as "(x)"] and to measure it, all we have to do is to transfer that label "x" to a measurement device. Say the measurement device (before measurement) points to a default location (for example '0') like this: (0). Then, we'll place that device next to the position we want to measure, and attempt to make the device "reflect" the position:
$$(x)(0)\to (x)(x)$$ 
This is just what I want, because now I can read the position of the thing I want to measure off of my measurement device. 

I once in a while get the question: "Why do you have to have a measurement device? Can't you just read the position off of the system you want to measure directly?" The answer is no, no you can't. The thing is the thing: it stands there in the corner, say. If you measure something, you have to transfer the state of the thing to something you read off of. The variable that reflects the position can be very different from the thing you are measuring. For example, a temperature can be transferred to the height of a mercury column. In a measurement, you create a correlation between two systems. 

In a classical measurement, the operation that makes that possible is a copying operation. You copy the system's state onto the measurement device's state. The copy can be made out of a very different material (for example, a photograph is a copy of a 3D scene onto a two-dimensional surface, made out of whatever material you choose). But system and measurement refer to each other.

All right, so measuring really is copying. And reading this the sophisticated reader (yes, I mean you!) starts smelling a rat right away. Because you already know that copying is just fine in classical physics, but it really is against the law in quantum physics. That's right: there is a no-cloning (or no-xeroxing) theorem, in effect in quantum mechanics. You're not allowed to make exact copies. Ever. 

So how can quantum measurement work at all, if measurement is intrinsically copying?

That, dear reader, is indeed the question. And what I'll try to convince you of is now fairly obvious, namely that quantum measurement is really impossible in principle, unless you just happen to be in the "right basis". This "right basis", basically, is a basis where everything looks classical to begin with. (We'll get to this in more detail later). What I will try to convince you here is that quantum measurement is impossible, if you want a quantum measurement to do what you expect from a classical measurement, namely that your device reflects the state of the system. 

The no-cloning theorem makes that impossible. 

I could stop here, you know. "Stop worrying about quantum measurement", I could write, "because I just showed you that quantum measurement is impossible in principle!"

But I won't, because there is so much more to be said. For example, even though quantum measurements are impossible in principle, it's not like people haven't tried, right? So what is it that people are measuring? What are the measurement devices saying? 

I'll tell you, and I guarantee you that you will not like it one bit.

But first, I owe you this piece: to show you how quantum measurement works. So our quantum system $Q$ is in state $|x\rangle$. Our measurement device is conveniently already in its default state $|0\rangle$. You can, by the way, think about what happens if the measurement device is not pointing to an agreed-upon direction (such as '0') before measurement, but Johnny vN has already done this for you on page 233 of his "Grundlagen". Here he is, by the way, discussing stuff with Ulam and Feynman, most likely in Los Alamos.
Left to right: Stanislaw Ulam, Richard Feynman, John von Neumann
To be a fly on the wall there! Note how JvN (to the right) is always better dressed than the people he hangs out with!

So investigating various possible initial states of the quantum measurement device does nothing for you, he finds, and of course he is correct. So we'll assume it points to $|0\rangle$. 

So we start with $|Q\rangle|M\rangle=|x\rangle|0\rangle$. What now? Well, the measurement operator, which of course has to be unitary (meaning it conserves probabilities, yada yada) must project the quantum state, then move the needle on the measurement device. For a position measurement, the unitary operator that does this is
$$U=e^{iX\otimes P}$$
where $X$ is the operator whose eigenstate is $|x\rangle$ (meaning $X|x\rangle=x|x\rangle$), and where $P$ is the operator conjugate to $X$. $P$ (the "momentum operator") makes spatial translations. For example, $e^{iaP}|x\rangle=|x+a\rangle$, that is, $x$ was made into $x+a$.  The $\otimes$ reminds you that $X$ acts on the first vector (the quantum system), and $P$ acts on the second (the measurement device). 

So, what this means is that 
$$U|x\rangle|0\rangle=e^{iX\otimes P}|x\rangle|0\rangle=e^{ix P}|x \rangle|0\rangle=|x\rangle|x\rangle .$$ 
Yay: the state of the quantum system was copied onto the measurement device! Except that you already can see what happens if you try to apply this operator to a superposition of states such as $|x+y\rangle$:
$$U|x+y\rangle|0\rangle=e^{iX\otimes P}|x+y\rangle|0\rangle=e^{ix P}|x \rangle|0\rangle+e^{iy P}|y \rangle|0\rangle=|x\rangle|x\rangle + |y\rangle|y\rangle .$$
And that's not at all what you would have expected if measurement was like the classical case, where you would have gotten $(|x\rangle + |y\rangle)(|x\rangle + |y\rangle)$. And what I just showed you is really just the proof that cloning is impossible in quantum physics.

So there you have it: quantum measurement is impossible unless the state that you are measuring just happens to already be in an eigenstate of the measurement operator, that is, it is not in a quantum superposition. 

Whether or not a quantum system is in a superposition depends on the basis that you choose to perform your quantum measurement. I do realize that the concept of a "basis" is a bit technical: it is totally trivial to all of you who have been working in quantum mechanics for years, but less so for those of you who are just curious. In everyday life, it is akin to measuring temperature in Celsius or Fahrenheit, for example, or location in Euclidean as opposed to polar coordinates. But in quantum mechanics, the choice of a basis is much more fundamental, and I really don't know of a good way to make it more intuitive (meaning, without a lot more math). A typical distinction is to measure photon polarization either in terms of horizontal/vertical, or left/right circular. I know, I'm not helping. Let's just skip this part for now. I might get back to it later.

So what happens when you measure a quantum system, and your measurement device is not "perfectly aligned" (basis-wise) with the quantum system? As it in fact almost never will be, by the way, unless you use a classical device to measure a classical system. Because in classical physics, we are all in the same basis automatically.  (OK, I see that I'll have to clarify this to you but trust me here.)

Look forward to Part 4 instead. Where I will finally delve into "Stage 2" of the measurement process. That is the one that baffled von Neumann, because he could not understand where exactly the wavefunction collapses. And in hindsight, there was no way he could have figured this out, because the wavefunction never collapses. Ever. What I'll show you in Part 4 is how a measurement device can be perfectly (by which I mean intrinsically) consistent, yet tell you a story about what the quantum state is and lie to you at the same time. Lie to you, through its proverbial teeth, if it had any.  

But come on, cut the measurement device some slack. It is lying to you because it has no choice. You ask it to make a copy of the quantum state, and it really is not allowed to do so. What will happen (as I will show you), is that it will respond by displaying to you a random value, with a probability given by the square of some part of the amplitude of the quantum wavefunction. In other words, I'll show you how Born's rule comes about, quite naturally. In a world where no wavefunction collapses, of course.

Part 4 is here

Monday, July 14, 2014

On quantum measurement (Part 2: Some history, and John von Neumann is confused)

This is Part 2 of the "On quantum measurement" series. Part 1: (Hans Bethe, the oracle) is here.

Before we begin in earnest, I should warn you, (or ease your mind, whichever is your preference): this sequence has math in it. I'm not in it to dazzle you with math. It's just that I know no other way to convey my thoughts about quantum measurement in a more succinct manner. Math, you see, is a way for those of us who are not quite bright enough, to hold on to thoughts which, without math, would be too daunting to formulate, too ambitious to pursue. Math is for poor thinkers, such as myself. If you are one of those too, come join me. The rest of you: why are you still reading? Oh, you're not. OK. 

Hey, come back: this historical interlude turns out to be math-free after all. But I promise math in Part 3.

Before I offer to you my take on the issue of quantum measurement, we should spend some time reminiscing, about the history of the quantum measurement "problem". If you've read my posts (and why else would you read this one?), you'll know one thing about me: when the literature says there is a "problem", I get interested. 

This particular problem isn't that old. It arose through a discussion between Niels Bohr and Albert Einstein, who disagreed vehemently about measurement, and the nature of reality itself.  

Bohr and Einstein at Ehrenfest's house, in 1925. Source: Wikimedia

The "war" between Bohr and Eintein only broke out in 1935 (via dueling papers in the Physical Review), but the discussion had been brewing for 10 years at least. 

Much has been written about the controversy (and a good summary albeit with a philosophical bent can be found in the Stanford Encyclopedia of Philosophy). Instead of going into that much detail, I'll just simplify it by saying:

Bohr believed the result of a measurement reflects a real objective quantity (the value of the property being measured).

Einstein believed that quantum systems have objective properties independent of their measurements, and that becuase quantum mechanics cannot properly describe them, the theory must necessarily be incomplete.

In my view, both views are wrong. Bohr's because his argument relies on a quantum wavefunction that collapses upon measurement (which as I'll show you is nonsense), and Einstein's because the idea that a quantum system has objective properties (described by one of the eigenstates of a measurement device) is wrong and that, as a consequence the notion that quantum mechanics must be incomplete is wrong as well. He was right, though, about the fact that quantum systems have properties independently of whether you measure them or not. It is just that we may not ever know what these properties are.

But enough of the preliminaries. I will begin to couch quantum measurement in terms of a formalism due to John von Neumann. If you think I'm obsessed by the guy because he seems to make an appearance in every second blog post of mine: don't blame me. He just ended up doing some very fundamental things in a number of different areas. So I'm sparing you the obligatory picture of his, because I assume you have seen his likeness enough. 

John von Neumann's seminal book on quantum mechanics is called "Mathematische Grundlagen der Quantenmechanik" (Mathematical foundations of quantum theory), and appeared in 1932, three years before the testy exchange of papers (1) between Bohr and Einstein. 

My copy of the "Grundlagen". This is the version issued by the U.S. Alien Property Custodian from 1943 by Dover Publications. It is the verbatim German book, issued in the US in war time. The original copyright is by J. Springer, 1932.

In this book, von Neumann made a model of the measurement process that had two stages, aptly called "first stage" and "second stage". [I want to note here that JvN actually called the first stage "Process 2" and the second stage "Process 1", which today would be confusing so I reversed it.]

The first stage is unitary, which means "probability conserving". JvN uses the word "causal" for this kind of dynamics. In today's language, we call that process an "entanglement operation" (I'll describe it in more details momentarily, which means "wait for Part 3"). Probability conservation is certainly a requisite for a causal process, and I actually like JvN's use of the word "causal". That word now seems to have acquired a somewhat different meaning

The second stage is the mysterious one. It is (according to JvN) acausal, because it involves the collapse of the wavefunction (or as Hans Bethe called it, the "reduction of the wavepacket"). It is clear that this stage is mysterious to Johnny, because he doesn't know where the collapse occurs. He is following "type one" processes in a typical measurement (in the book, he measures temperature as an example) from the thermal expansion of the mercury fluid column, to the light quanta that scatter off the mercury column and enter our eye, where the light is refracted in the lense and forms an image on the retina, which then stimulate nerves in the visual cortex, and ultimately creates the "subjective experience" of the measurement. 

According to JvN, the bounday between what is the quantum system and what is the measurement device can be moved in an arbitrary fashion. He understands perfectly that a division into a system to be measured and a measuring system is necessary and crucial (and we'll spend considerable time discussing this), but the undeniable fact—that it is not at all clear where to draw the boundary— is a mystery to him. He invokes the philosophical principle of "psychophysical parallelism"—which states that there can be no causal interaction between the mind and the body— to explain why the boundary is so fluid. But it is the sentence just following this assertion that puts the finger on what is puzzling him. He writes: 

"Because experience only ever makes statements like this: 'an observer has had a (subjective) perception', but never one like this: 'a physical quantity has taken on a particular value'."(2)

This is, excuse my referee's voice, very muddled. He says: We never have the experience "X takes on x", we always experience "X looks like it is in state x". But mathematically they should be the same. He makes a distinction that does not exist. We will see later why he feels he must make that distinction. But, in short, it is because he thinks that what we perceive must also be reality. If a physical object X is perceived to take on state x, then this must mean that objectively "X takes on x". In other words, he assumes that subjective experience must mirror objective fact.

Yet, this is provably dead wrong. 

That is what Nicolas and I discovered in the article in question, and that is undoubtedly what Hans Bethe immediately realized, but struggled to put into words. 

Quantum reality, in other words, is a whole different thing than classical reality. In fact, in the "worst case" (to be made precise as we go along) they may have nothing to do with each other, as Nicolas and I  argue in a completely obscure (that is unknown) article entitled "What Information Theory Can Tell us About Quantum Reality" (3).

What you will discover when following this series of posts, is that if your measurement device claims "the quantum spin that you were measuring was in state up", then this may not actually tell you anything about the true quantum state. The way I put it colloquially is that "measurement devices tend to lie to you". They lie, because they give you an answer that is provably nonsense. 

In their (the device's) defense, they have no choice but to lie to you (I will make that statement precise when we do math). They lie because they are incapable of telling the truth. Because the truth is, in a precise information-theoretic way that I'll let you in on, bigger than they are. 

JvN tried to reconcile subjective experience with objective truth. Subjectively, the quantum state collapsed from a myriad of possibilities to a single truth. But in fact, nothing of the sort happens. Your subjective experience is not reflecting an objective truth. The truth is out there, but it won't show itselves in our apparatus. The beauty of theoretical physics is that we can find out about how the wool is being pulled over our eyes—how classical measurement devices are conspiring to deceive us—when our senses would never allow us a glimpse of the underlying truth.

Math supporting all that talk will start in Part 3. 

(1) Einstein (with Podolsky and Rosen) wrote a paper entitled "Can Quantum-Mechanical Description of Physical Reality Be Considered Complete?". It appeared in Phys. Rev. 47 (1935) 777-780. Four pages: nowadays it would be a PRL. I highly recommend reading it. Bohr was (according to historical records and the narrative in Zurek's great book about it all) incensed. Bohr reacted by writing a paper with the same exact title as Einstein's, that has (in my opinion) nothing in it. It is an astonishing paper because it is content-free, but was meant to serve as a statement that Bohr refutes Einstein, when in fact Bohr had nothing. 

(2) Denn die Erfahrung macht nur Aussagen von diesem Typus: ein Beobachter hat eine bestimmte (subjektive) Wahrnehmung gemacht, und nie eine solche: eine physikalische Größe hat einen bestimmten Wert. 

(3) C. Adami & N.J. Cerf, Lect. Notes in Comp. Sci. 1509 (1999) 258-268

Part 3 (No cloning allowed) continues here

Sunday, June 22, 2014

On quantum measurement (Part 1: Hans Bethe, the oracle)

For this series of posts, I'm going to take you on a ride through the bewildering jungle that is quantum measurement. I've no idea how many parts will be enough, but I'm fairly sure there will be more than one. After all, the quantum mechanics of measurement has been that subject's "mystery of mysteries" for ages, it now seems. 

Before we begin, I should tell you how I became interested in the quantum measurement problem. Because for the longest time I wasn't. During graduate school (at the University of Bonn), the usual thing happened: the Prof (in my case Prof. Werner Sandhas, who I hope turned eighty this past April 14th) says that they'll tell us about quantum measurement towards the end of the semester, and never actually get there. I have developed a sneaking suspicion that this happened a lot, in quantum mechanics classes everywhere, every time. Which would explain a lot of the confusion that still reigns. 

However, to tell you how I became interested in this problem is a little difficult, because I risk embarrassing myself. The embarrassment that I'm risking is not the usual type. It is because the story that I will tell you will seem utterly ridiculous, outrageously presumptuous, and altogether improbable. But it occurred just as I will attempt to tell it. There is one witness to this story, my collaborator in this particular endeavor, the Belgian theoretical physicist Nicolas Cerf.  

Now, because Nicolas and I worked together very closely on a number of different topics in quantum information theory when we shared an office at Caltech, you might surmise that he would corroborate any story I write (and thus not be an independent witness). I'm sure he remembers the story (wait for it, I know I'm teasing) differently, but you would have to ask him. All I can say is that this is how I remember it.

Nicolas and I had begun to work in quantum information theory around 1995-1996. After a while we were studying the quantum communication protocols of quantum teleportation and quantum superdense coding, and in our minds (that is, our manner of counting), information did not add up. But, we thought, information must be conserved. We were certain. (Obviously that has been an obsession of mine for a while, those of you who have read my black hole stories will think to yourselves).
Space-time diagrams for the quantum teleportation process (a) and superdense coding process (b). EPR stand for an entangled Einstein-Podolsky-Rosen pair. Note the information values for the various classical and quantum bits in red. Adapted from Ref. [1]. The letters 'M' and 'U' stand for a measurement and a unitary opration, respectively. A and B are the comunication partners 'Alice' and 'Bob'.

But information cannot be conserved, we realized, unless you can have negative bits. Negative entropy: anti-qubits (see the illustration above). This discovery of ours is by now fairly well-known (so well-known, in fact, that sometimes articles about negative quantum entropy don't seem to feel it necessary to refer to our original paper at all). But it is only the beginning of the story (ludicrous as it may well appear to you) that I want to tell. 

After Nicolas and I wrote the negative entropy paper, we realized that quantum measurement was, after all, reversible. That fact was obvious once you understood these quantum communication protocols, but it was even more obvious once you understood the quantum erasure experiment. Well, for all we knew, this was flying in the face of accepted lore, which (ever since Niels Bohr) would maintain that quantum measurement required an irreversible collapse of the quantum wavefunction. Ordinarily, I would now put up a picture of the Danish physicist who championed wave function collapse, but I cannot bring myself to do it: I have come to loathe the man. I'm sure I'm being petty here.

With this breakthrough discovery in mind ("Quantum measurement is reversible!") Nicolas and I went to see Hans Bethe, who was visiting Caltech at the time. At this point, Hans and I had become good friends, as he visited Caltech regularly. I wrote up my recollections of my first three weeks with him (and also our last meeting) in the volume commemorating his life. (If you don't want to buy that book but read the story, try this link. But you should really buy the book: there's other fun stuff in it). The picture below is from Wikipedia, but that is not how I remember him. I first met him when he was 85. 
         Hans A. Bethe (1906-2005) (Source: Wikimedia

Alright, enough of the preliminaries. Nicolas Cerf and I decided to ask for Hans's advice, and enter his office, then on the 3rd floor of Caltech's Kellogg Radiation Laboratory. For us, that meant one flight of stairs up. We tell him right away that we think we have discovered something important that is relevant to the physics of quantum measurement, and start explaining our theory. I should tell you that what we have at this point isn't much of a theory: it is the argument, based on negative conditional quantum entropies, that quantum measurement can in principle be reversed. 

Hans listens patiently. Once in a while he asks a question that forces us to be more specific.

After we are done, he speaks.

"I am not that much interested in finding that quantum measurement is reversible. What I find much more interesting is that you have solved the quantum measurement problem."

After that, there is a moment of silence. Both Nicolas and I are utterly stunned. 

I am first to ask the obvious. 
"Can you explain to us why?"

You see, it is fairly improbable that a physicist of the caliber of Hans Bethe tells you that you have solved the "mystery of mysteries". Neither Nicolas nor I had seen this coming from a mile away. And we certainly had no idea why he just said that.

We were waiting with--shall we say--bated breath. Put yourself into our position. How would you have reacted? What came after was also wholly unexpected.

After I asked him to explain that last statement, he was silent for--I don't know--maybe three seconds. In a conversation like this, that is bordering on a perceived eternity.

My recollection is fuzzy at this point. Either he began by saying "I can't explain it to you", or he immediately told the story of the Mathematics Professor who lectures on a complex topic and fills blackboard after blackboard, until a student interrupts him and asks: "Can you explain this last step in your derivation to me?"

The Professor answers: "It is obvious". The student insists. "If it is obvious, can you explain it?", and the Professor answers: "It is obvious, but I'll have to get back to you to explain it tomorrow".

At this point of Hans telling this story, the atmosphere is a little awkward. Hans tell us that it is obvious that we solved the quantum measurement problem, but he can't tell us exactly why he thinks it is obvious that we did. It certainly is not obvious to us.

I know Hans well enough at this point that I press on. I cannot let that statement go just like that. He did go on to try to explain what he meant.  Now of course I wish I had taken notes but I didn't. But what he said resonated in my mind for a long time (and I suspect that this is true for Nicolas as well). After what he said, we both dropped everything we were doing, and worked only on the quantum measurement problem, for six months, culminating in this paper

What he said was something like this: "When you make a measurement, its outcome is conditional on the measurements made on that quantum system before that, and so on, giving rise to a long series of measurements, all conditional on each other".

This is nowhere near an exact rendition of what he said. All I remember is him talking about atomic decay, and measuring the product of the decay and that this is conditional on previous events, and (that is the key thing I remember) that this gives rise to these long arcs of successive measurements whose outcomes are conditional on the past, and condition the future. 

Both Nicolas and I kept trying to revive that conversation in our memory when we worked on the problem for the six months following. (Hans left Caltech that year the day after our conversation). Hans also commented that our finding had deep implications for quantum statistical mechanics, because it showed that the theory is quite different from the classical theory after all. We did some work on the quantum Maxwell Demon in reaction to that, but never really had enough time to finish it. Other people after us did. But for the six months that followed, Nicoals and I worked with only this thought in our mind:

"He said we solved the problem. Let us find out how!"

In the posts that follow this one, I will try to give you an idea of what it is we did discover (most of it contained in the article mentioned above). You will easily out find that this article isn't published (and I'll happily tell you the story how that happened some other time). While a good part of what's in that paper did get published ultimately, I think the main story is still untold. And I am attempting to tell this story still, via a preprint I have about consecutive measurements, that I'm also still working on. But consecutive measurement is what Hans was telling us about in this brief session, that changed the scientific life of both Nicolas and I. He knew what he was talking about, but he didn't know how to tell us just then. It was obvious to him. I hope it will be obvious to me one day too.

Even though the conversation with Hans happened as I described, I should tell you that 18 years after Hans said this to us (and thinking about it and working on it for quite a while) I don't think he was altogether right. We had solved something, but I don't think we solved "the whole thing". There is more to it. Perhaps much more.

Stay tuned for Part 2, where I will explain the very basics of quantum measurement, what von Neumann had to say about it, as well as what this has to do with Everett and the "Many-world" interpretation. And if this all works out as I plan, perhaps I will ultimately get to the point that Hans Bethe certainly did not foresee: that the physics of quantum measurement is intimately linked to Gödel incompleteness. But I'm getting ahead of myself.

[1] N.J. Cerf and C. Adami. Negative entropy and information in quantum mechanics. Phys. Rev. Lett. 79 (1997) 5194-5197.

Note added: upon reading the manuscript again after all this time, I found in the acknowledgements the (I suppose more or less exact) statement that Hans had made. He stated that "negative entropy solves the problem of the reduction of the wave packet". Thus, it appears he did not maintain that we had "solved the measurement problem" as I had written above, only a piece if it.

Part 2 (Some history, and John von Neumann is confused) continues here.

Sunday, June 8, 2014

Whose entropy is it anyway? (Part 2: The so-called Second Law)

This is the second part of the "Whose entropy is it anyway?" series. Part 1: "Boltzmann, Shannon, and Gibbs" is here.

Yes, let's talk about that second law in light of the fact we just established, namely that Boltzmann and Shannon entropy are fundamentally describing the same thing: they are measures of uncertainty applied to different realms of inquiry, making us thankful that Johnny vN was smart enough to see this right away. 

The second law is usually written like this: 

"When an isolated system approaches equilibrium from a non-equilibrium state, its entropy almost always increases"

I want to point out here that this is a very curious law, because there is, in fact, no proof for it. Really, there isn't. Not every thermodynamics textbook is honest enough to point this out, but I have been taught this early on, because I learned Thermodynamics from the East-German edition of Landau and Lifshitz's tome "Statistische Physik", which is quite forthcoming about this (in the English translation):

"At the present time, it is not certain whether the law of increase of entropy thus formulated can be derived from classical mechanics"

From that, L&L go on to speculate that the arrow of time may be a consequence of quantum mechanics.

I personally think that quantum mechanics has nothing to do with it (but see further below). The reason the law cannot be derived is because it does not exist. 

I know, I know. Deafening silence. Then:

"What do you mean? Obviously the law exists!"

What I mean, to be more precise, is that strictly speaking Boltzmann's entropy cannot describe what goes on when a system not at equilibrium approaches said equilibrium, because Boltzmann's entropy is an equilibrium concept. It describes the value that is approached when a system equilibrates. It cannot describe its value as it approaches that constant. Yes, Boltzmann's entropy is a constant: it counts how many microstates can be taken on by a system at fixed energy. 

When a system is not at equlibrium, fewer microstates are actually occupied by the system, but the number it could potentially take on is constant. Take, for example, the standard "perfume bottle" experiment that is so often used to illustrate the second law:
An open "perfume bottle" (left) about to release its molecules into the available space (right)

The entropy of the gas inside the bottle is usually described as being small, while the entropy of the gas on the right (because it occupies a large space) is believed to be large. But Boltzmann's formula is actually not applicable to the situation on the left, because it assumes (on account of the equilibrium condition), that the probability distributions in phase space of all particles involved are independent. But they are clearly not, because if I know the location of one of the particles in the bottle, I can make very good predictions about the other particles because they occupy such a confined space. (This is much less true for the particles in the larger space at right, obviously).

What should we do to correct this? 

We need to come up with a formula for entropy that is not explicitly true only at equilibrium, and that allows us to quantify correlations between particles. Thermodynamics cannot do this, because equilibrium thermodynamics is precisely that theory that deals with systems whose correlations have decayed long ago, or as Feynman put it, systems "where all the fast things have happened but the slow things have not". 

Shannon's formula, it turns out, does precisely what we are looking for: quantify correlations between all particles involved. Thus, Shannon's entropy describes, in a sense, nonequilibrium thermodynamics. Let me show you how.

Let's go back to Shannon's formula applied to a single molecule, described by a random variable $A_1$, and call this entropy $H(A_1)$. 

I want to point out right away something that may shock and disorient you, unless you followed the discussion in the post "What is Information? (Part 3: Everything is conditional)" that I mentioned earler. This entropy $H(A_1)$ is actually conditional. This will become important later, so just store this away for the moment. 

OK. Now let's look at a two-atom gas. Our second atom is described by random variable $A_2$, and you can see that we are assuming here that the atoms are distinguishable. I do this only for convenience, everything can be done just as well for indistinguishable particles.

If there are no correlations between the two atoms, then the entropy of the joint system $H(A_1A_2)=H(A_1)+H(A_2)$, that is, entropy is extensive. Thermodynamical entropy is extensive because it describes things at equilibrium. Shannon entropy, on the other hand is not. It can describe things that are not at equilibrium, because then
$$H(A_1A_2)=H(A_1)+H(A_2)-H(A_1:A_2) ,$$
where $H(A_1:A_2)$ is the correlation entropy, or shared entropy, or information, between $A_1$ and $A_2$. It is what allows you to predict something about $A_2$ when you know $A_1$, which is precisely what we already knew we could do in the picture of the molecules crammed into the perfume bottle on the left. This is stunning news for people who only know thermodynamics,

What if we have more particles? Well, we can quantify those correlations too. Say we have three variables, and the third one is (with very little surprise) described by variable $A_3$. It is then a simple exercise to write the joint entropy $H(A_1A_2A_3)$ as
Entropy Venn diagram for three random variables, with the correlation entropries indicated.

We find thus that the entropy of the joint system of variables can be written in terms of the extensive entropy (the sum of the subsystem entropies) minus the correlation entropy $H_{\rm corr}$, which inlcudes correlations between pairs of variables, triplets of variables, and so forth. Indeed, the joint entropy of an $n$-particle system can be written in terms of a sum that features the (extensive) sum of single-particle entropies plus (or minus) the possible many-particle correlation entropies (the sign always alternates between even and odd number of participating particles):
$$H(A_1,...,A_n)=\sum_{i=1}^n H(A_i)-\sum_{i\neq j}H(A_i:A_j)+\sum_{i\neq j\neq k} H(A_i:A_j:A_k)-\cdots. $$
This formula quickly becomes cumbersome, which is why Shannon entropy isn't a very useful formulation of non-equilibrium thermodynamics unless the correlations are somehow confined to just a few variables. 

Now, let's look at what happens when the gas in the bottle escapes into the larger area. Initially, the entropy is small, because the correlation entropy is large. Let's write this entropy as 
where $I$ is the information I have because I know that the molecules are in the bottle. You now see why the entropy is small: you know a lot (in fact, $I$) about the system. The unconditional piece is the entropy of the system when all the fast things (the molecules escaping the bottle) have happened.  

Some of you may have already understood what happens when the bottle is opened: the information $I$ that I have (or any other observer, for that matter, has) decreases. And as a consequence, the conditional entropy $H(A_1,...,A_n|I)$ increases. It does so until $I=0$, and the maximum entropy state is achieved. Thus, what is usually written as the second law is really just the increase of the conditional entropy as information becomes outdated. Information, after all, is that which allows me to make predictions with accuracy better than chance. If the symbols that I have in my hand (and that I use to make the predictions) do not predict anymore, then they are not information anymore: they have turned to entropy. Indeed, in the end this is all the second law is about: how information turns into entropy.

You have probably already noticed that I could now take the vessel on the right of the figure above and open that one up. Then you realize that you did have information after all, namely you knew that the particles were confined to the larger area. This example teaches us that, as I pointed out in "What is Information? (Part I)", the entropy of a system is not a well-defined quantity unless we specify what measurement device we are going to use to measure it with, and as a consequence what the range of values of the measurements are going to be. 

The original second law, being faulty, should therefore be reformulated like this: 

In a thermodynamical equilibrium or non-equilibrium process, the unconditional (joint) entropy of a closed system remains a constant. 

The "true second law", I propose, should read:

When an isolated system approaches equilibrium from a non-equilibrium state, its conditional entropy almost always increases

Well, that looks suspiciously like the old law, except with the word "conditional" in front of "entropy". It seems like an innocuous change, but it took two blog posts to get there, and I hope I have convinced you that this change is not at all trivial. 

Now to close this part, let's return to Gibbs's entropy, which really looks exactly like Shannon's. And indeed, the $p_i$ in Gibbs's formula 
$$S=-\sum_i p_i\log p_i$$
could just as well refer to non-equilibrium distributions. If it does refer to equilibrium, we should use the Boltzmann distribution (I set here Boltzmann's constant to $k=1$, as it really just renormalizes the entropy)
$$p_i=\frac1Z e^{-E_i/T}$$
where $Z=\sum_ie^{-E_i/T}$ is known as the "partition function" in thermodynamics (which just makes sure that the $p_i$ are correctly normalized), and $E_i$ is the energy of the $i$th microstate. Oh yeah, T is the temperature, in case you were wondering.

If we plug this $p_i$ into Gibbs's (or Shannon's) formula, we get 
$$S=\log Z+E/T$$
This is, of course, a well-known thermodynamical relationship because $F=-T\log Z$ is also known as the Helmholtz free energy, so that $F=E-TS$. 

As we have just seen that this classical formula is the limiting case of using the Boltzmann (equilibrium) distribution within Gibbs's (or Shannon's) formula, we can be pretty confident that the relationship between information theory and thermodynamics I just described is sound. 

As a last thought: how did von Neumann know that Shannon's formula was the (non-equilibrium) entropy of thermodynamics? He had been working on quantum statistical mechanics in 1927, and deduced that the quantum entropy should be written in terms of the quantum density matrix $\rho$ as (here "Tr" stands for the matrix trace)
$$S(\rho)=-{\rm Tr} \rho\log \rho.$$
Quantum mechanical density matrices are in general non-diagonal. But were they to become classical, they would approach a diagonal matrix where all the elements on the diagonal are probabilities $p_1,...,p_n$. In that case, we just find
$$S(\rho)\to-\sum_{i=1}^n p_i\log p_i, $$ 
in other words, Shannon's formula is just the classical limit of the quantum entropy that was invented twentyone years before Shannon thought of it, and you can bet that Johnny immediately saw this!

In other words, there is a very good reason why Boltzmann's, Gibbs's, and Shannon's formulas are all called entropy, and Johnny von Neumann didn't make this suggestion to Shannon in jest.

Is this the end of "Whose entropy is it anyway?". Perhaps, but I have a lot more to write about the quantum notion of entropy, and whether considering quantum mechanical measurements can say anything about the arrow of time (as Landau and Lifshitz suggested). Because considering the quantum entropy of the universe can also say something about the evolution of our universe and the nature of the "Big Bang", perhaps a Part 3 will be appropriate. 

Stay tuned!

Saturday, June 7, 2014

Whose entropy is it anyway? (Part 1: Boltzmann, Shannon, and Gibbs )

Note: this post was slated to appear on May 31, 2014, but events outside of my control (such as grant submission deadlines, and parties at my house) delayed its issuance.

The word "entropy" is used a lot, isn't it? OK, not in your average conversation, but it is a staple of conversations between some scientists, but certainly all nerds and geeks. You have read my introduction to information theory I suppose (and if not, go ahead and start here, right away!)  But in my explanations of Shannon's entropy concept, I only obliquely referred to another "entropy": that which came before Shannon: the thermodynamic entropy concept of Boltzmann and Gibbs. The concept was originally discussed by Clausius, but because he did not give a formula, I will just have to ignore him here. 

Why do these seemingly disparate concepts have the same name? How are they related? And what does this tell us about the second law of thermodynamics?

This is the blog post (possibly a series) where I try to throw some light on that relationship. I suspect that what follows below isn't very original (otherwise I probably should have written it up in a paper), but I have to admit that I didn't really check. I did write about some of these issues in an article that was published in a Festschrift on the occasion of the 85th birthday of Gerry Brown, who was my Ph.D. co-advisor and a strong influence on my scientific career. He passed away a year ago to this day, and I have not yet found a way to remember him properly. Perhaps a blog post on the relationship between thermodynamics and information theory is appropriate, as it bridges a subject Gerry taught often (Thermodynamics) with a subject I have come to love: the concept of information. But face it: a book chapter doesn't get a lot of readership. Fortunately, you can read it on arxiv here, and I urge you to because it does talk about Gerry in the introduction.  

Gerry Brown (1926-2013)
Before we get to the relationship between Shannon's entropy and Boltzmann's, how did they end up being called by the same name? After all, one is a concept within the realm of physics, the other from electrical engineering. What gives?

The one to blame for this confluence is none other than John von Neumann, the mathematician, physicist, engineer, computer scientist (perhaps Artificial Life researcher, sometimes moonlighting as an economist). It is difficult to appreciate the genius that was John von Neumann, not the least because there aren't many people who are as broadly trained as he was. For me, the quote that fills me with awe comes from another genius who I've had the priviledge to know well, the physicist Hans Bethe. I should write a blog post about my recollections of our interactions, but there is already a write-up in the book memorializing Hans's life. While I have never asked Hans directly about his impressions of von Neumann (how I wish that I had!), he is quoted as saying (in the 1957 LIFE magazine article commemorating von Neumann's death: "I have sometimes wondered whether a brain like von Neumann's does not indicate a species superior to man".

The reason why I think that this quite a statement, is that I think Bethe's brain was in itself very unrepresentative of our species, and perhaps indicated an altogether different kind.

So, the story goes (as told by Myron Tribus in his 1971 article "Energy and Information") that when Claude Shannon had figured out his channel capacity theorem, he consulted von Neumann (both at Princeton at the time) about what he should call the "-p log p" value of the message to be sent over a channel. von Neumann supposedly replied:

"You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name. In the second place, and more importantly, no one knows what entropy really is, so in a debate you will always have the advantage.”

The quote is also reprinted in the fairly well-known book "Maxwell's Demon: Entropy, Information, and Computing", edited by Leff and Rex. Indeed, von Neumann had defined a quantity just like that as early as 1927 in the context of quantum mechanics (I'll get to that). So he knew exactly what he was talking about.

Let's assume that this is an authentic quote. I can see how it could be authentic, because the thermodynamic concept of entropy (due to the Austrian physicist Ludwig Boltzmann) can be quite, let's say, challenging. I'm perfectly happy to report that I did not understand it for the longest time, in fact not until I understood Shannon's entropy, and perhaps not until I understood quantum entropy.
Ludwig Boltzmann (1844-1906). Source: Wikimedia
Boltzmann defined entropy. In fact, his formula $S= k \log W$ is engraved on top of his tombstone, as shown here:
Google "Boltzmann tombstone" to see the entire marble edifice to Boltzmann
In this formula, $S$ stands for entropy, $k$ is now known as "Boltzmann's constant", and $W$ is the number of states (usually called "microstates" in statistical physics) a system can take on. But it is the $\log W$ that is the true entropy of the system. Entropy is actually a dimensionless quantity in thermodynamics. It takes on the form above (which has the dimensions of the constant $k$) if you fail to convert the energy units of temperature into more manageable units, such as the Kelvin. In fact, $k$ just tells you how to do this translation:
$$k=1.38\times 10^{-23} {\rm J/K},$$
where J (for Joule) is the SI unit for energy. If you define temperature in these units, then entropy is dimensionless
$$S=\log W.   (1)$$
But this doesn't at all look like Shannon's formula, you say? 

You're quite right. We still have a bit of work to do. We haven't yet exploited the fact that $\log W$ is the number of microstates consistent with a macrostate at energy $E$. Let us write down the probability distribution $w(E)$ for the macrostate to be found with energy $E$. We can then see that

I'm sorry, that last derivation was censored. It would have bored the tears out of you. I know because I could barely stand it myself. I can tell you where to look it up in Landau & Lifshitz if you really want to see it.

The final result is this: Eq. (1) can be written as
$$S=-\sum_{E_i} w_i\log w_i   (2)$$
implying that Boltzmann's entropy formula looks to be exactly the same as Shannon's. 

Except, of course, that in the equation above the probabilities $w_i$ are all equal to each other. If some microstates are more likely than others, the entropy becomes simply
$$S=-\sum_{E_i} p_i\log p_i     (3)$$
where the $p_i$ are the different probabilities to occupy the different microstate $i$. 

Equation (3) was derived by the American theoretical physicist Willard Gibbs, who is generally credited for the development of statistical mechanics. 

J. Willard Gibbs (1839-1903) Source: Wikimedia
Now Eq. (3) does precisely look like Shannon's, which you can check by comparing to Eq. (1) in the post "What is Information? (Part 3: Everything is conditional)". Thus, it is Gibbs's entropy that is like Shannon's, not Boltzmann's. But before I discuss this subtlety, ponder this:

At first sight, this similarity between Boltzmann's and Shannon's entropy appears ludicrous. Boltzmann was concerned with the dynamics of gases (and many-particle systems in general). Shannon wanted to understand whether you can communicate accurately over noisy channels. These appear to be completely unrelated endeavors. Except they are not, if you move far enough away from the particulars. Both, in the end, have to do with measurement. 

If you want to communicate over a noisy channel, the difficult part is on the receiving end (even though you quickly find out that in order to be able to receive the message in its pristine form, you also have to do some work at the sender's end). Retrieving a message from a noisy channel requires that you or I make accurate measurements that can distinguish the signal from the noise. 

If you want to characterize the state of a many-particle system, you have to do something other than measure the state of every particle (because that would be impossible). You'll have to develop a theory that allows us to quantify the state given a handful of proxy variables, such as energy, temperature, and pressure. This is, fundamentally, what thermodynamics is all about. But before you can think about what to measure in order to know the state of your system, you have to define what it is you don't know. This is Boltzmann's entropy: how much you don't know about the many-particle system. 

In Shannon's channel, a message is simply a set of symbols that can encode meaning (they can refer to something). But before it has any meaning, it is just a vessel that can carry information. How much information? This is what's given by Shannon's entropy. Thus, the Shannon entropy quantifies how much information you could possibly send across the channel (per use of the channel), that is, entropy is potential information

Of course, Boltzmann entropy is also potential information: If you knew the state of the many-particle system precisely, then the Boltzmann entropy would vanish. You (being an ardent student of thermodynamics) already know what is required to make a thermodynamical entropy vanish: the temperature of the system must be zero. This, incidentally, is the content of the third law of thermodynamics.

"The third law?", I hear some of you exclaim. "What about the second?"

Yes, what about this so-called Second Law?

To be continued, with special emphasis on the Second Law, in Part 2