Thursday, July 24, 2014

On quantum measurement (Part 3: No cloning allowed)

In the previous two parts, I told you how I became interested in the quantum measurement problem (Part I), and provided a bit of historical background (Part 2). Now we'll get to the heart of the matter. 

Note that I'm using MathJax to display equations in this blog. If your browser shows a bunch of dollar signs and gibberish where equations should appear, you probably have to figure out how to install MathJax on your browser. Don't email me: I know nothing about such intricacies.

Let me remind you that our hero John von Neumann described quantum measurement as a two-stage process. (No, I'm not showing his likeness.) The first stage is now commonly described as entanglement. This is what we'll discuss here. I'll get to the second process (the one where the wavefunction ostensibly collapses, except Hans Bethe told me that it doesn't) in Part 4. 

For the purpose of illustration, I'm going to describe the measurement of a position, but everything can be done just as well for discrete degrees of freedom, such as, you know, spins. In fact, I'll show you a bunch of spin measurements waaay later, like the Stern-Gerlach experiment, or the quantum eraser. But I'm getting ahead of myself.

Say our quantum system that we would like to measure is in state $|Q\rangle=|x\rangle$. I'm going to use Q to stand in for quantum systems a lot. Measurement devices will be called "M", or sometimes "A" or "B". 

All right. How do you measure stuff to begin with?

In classical physics, we might imagine that a system is characterized by the position variable $x$, [I'll write this as "(x)"] and to measure it, all we have to do is to transfer that label "x" to a measurement device. Say the measurement device (before measurement) points to a default location (for example '0') like this: (0). Then, we'll place that device next to the position we want to measure, and attempt to make the device "reflect" the position:
$$(x)(0)\to (x)(x)$$ 
This is just what I want, because now I can read the position of the thing I want to measure off of my measurement device. 

I once in a while get the question: "Why do you have to have a measurement device? Can't you just read the position off of the system you want to measure directly?" The answer is no, no you can't. The thing is the thing: it stands there in the corner, say. If you measure something, you have to transfer the state of the thing to something you read off of. The variable that reflects the position can be very different from the thing you are measuring. For example, a temperature can be transferred to the height of a mercury column. In a measurement, you create a correlation between two systems. 

In a classical measurement, the operation that makes that possible is a copying operation. You copy the system's state onto the measurement device's state. The copy can be made out of a very different material (for example, a photograph is a copy of a 3D scene onto a two-dimensional surface, made out of whatever material you choose). But system and measurement refer to each other.

All right, so measuring really is copying. And reading this the sophisticated reader (yes, I mean you!) starts smelling a rat right away. Because you already know that copying is just fine in classical physics, but it really is against the law in quantum physics. That's right: there is a no-cloning (or no-xeroxing) theorem, in effect in quantum mechanics. You're not allowed to make exact copies. Ever. 

So how can quantum measurement work at all, if measurement is intrinsically copying?

That, dear reader, is indeed the question. And what I'll try to convince you of is now fairly obvious, namely that quantum measurement is really impossible in principle, unless you just happen to be in the "right basis". This "right basis", basically, is a basis where everything looks classical to begin with. (We'll get to this in more detail later). What I will try to convince you here is that quantum measurement is impossible, if you want a quantum measurement to do what you expect from a classical measurement, namely that your device reflects the state of the system. 

The no-cloning theorem makes that impossible. 

I could stop here, you know. "Stop worrying about quantum measurement", I could write, "because I just showed you that quantum measurement is impossible in principle!"

But I won't, because there is so much more to be said. For example, even though quantum measurements are impossible in principle, it's not like people haven't tried, right? So what is it that people are measuring? What are the measurement devices saying? 

I'll tell you, and I guarantee you that you will not like it one bit.

But first, I owe you this piece: to show you how quantum measurement works. So our quantum system $Q$ is in state $|x\rangle$. Our measurement device is conveniently already in its default state $|0\rangle$. You can, by the way, think about what happens if the measurement device is not pointing to an agreed-upon direction (such as '0') before measurement, but Johnny vN has already done this for you on page 233 of his "Grundlagen". Here he is, by the way, discussing stuff with Ulam and Feynman, most likely in Los Alamos.
Left to right: Stanislaw Ulam, Richard Feynman, John von Neumann
To be a fly on the wall there! Note how JvN (to the right) is always better dressed than the people he hangs out with!

So investigating various possible initial states of the quantum measurement device does nothing for you, he finds, and of course he is correct. So we'll assume it points to $|0\rangle$. 

So we start with $|Q\rangle|M\rangle=|x\rangle|0\rangle$. What now? Well, the measurement operator, which of course has to be unitary (meaning it conserves probabilities, yada yada) must project the quantum state, then move the needle on the measurement device. For a position measurement, the unitary operator that does this is
$$U=e^{iX\otimes P}$$
where $X$ is the operator whose eigenstate is $|x\rangle$ (meaning $X|x\rangle=x|x\rangle$), and where $P$ is the operator conjugate to $X$. $P$ (the "momentum operator") makes spatial translations. For example, $e^{iaP}|x\rangle=|x+a\rangle$, that is, $x$ was made into $x+a$.  The $\otimes$ reminds you that $X$ acts on the first vector (the quantum system), and $P$ acts on the second (the measurement device). 

So, what this means is that 
$$U|x\rangle|0\rangle=e^{iX\otimes P}|x\rangle|0\rangle=e^{ix P}|x \rangle|0\rangle=|x\rangle|x\rangle .$$ 
Yay: the state of the quantum system was copied onto the measurement device! Except that you already can see what happens if you try to apply this operator to a superposition of states such as $|x+y\rangle$:
$$U|x+y\rangle|0\rangle=e^{iX\otimes P}|x+y\rangle|0\rangle=e^{ix P}|x \rangle|0\rangle+e^{iy P}|y \rangle|0\rangle=|x\rangle|x\rangle + |y\rangle|y\rangle .$$
And that's not at all what you would have expected if measurement was like the classical case, where you would have gotten $(|x\rangle + |y\rangle)(|x\rangle + |y\rangle)$. And what I just showed you is really just the proof that cloning is impossible in quantum physics.

So there you have it: quantum measurement is impossible unless the state that you are measuring just happens to already be in an eigenstate of the measurement operator, that is, it is not in a quantum superposition. 

Whether or not a quantum system is in a superposition depends on the basis that you choose to perform your quantum measurement. I do realize that the concept of a "basis" is a bit technical: it is totally trivial to all of you who have been working in quantum mechanics for years, but less so for those of you who are just curious. In everyday life, it is akin to measuring temperature in Celsius or Fahrenheit, for example, or location in Euclidean as opposed to polar coordinates. But in quantum mechanics, the choice of a basis is much more fundamental, and I really don't know of a good way to make it more intuitive (meaning, without a lot more math). A typical distinction is to measure photon polarization either in terms of horizontal/vertical, or left/right circular. I know, I'm not helping. Let's just skip this part for now. I might get back to it later.

So what happens when you measure a quantum system, and your measurement device is not "perfectly aligned" (basis-wise) with the quantum system? As it in fact almost never will be, by the way, unless you use a classical device to measure a classical system. Because in classical physics, we are all in the same basis automatically.  (OK, I see that I'll have to clarify this to you but trust me here.)

Look forward to Part 4 instead. Where I will finally delve into "Stage 2" of the measurement process. That is the one that baffled von Neumann, because he could not understand where exactly the wavefunction collapses. And in hindsight, there was no way he could have figured this out, because the wavefunction never collapses. Ever. What I'll show you in Part 4 is how a measurement device can be perfectly (by which I mean intrinsically) consistent, yet tell you a story about what the quantum state is and lie to you at the same time. Lie to you, through its proverbial teeth, if it had any.  

But come on, cut the measurement device some slack. It is lying to you because it has no choice. You ask it to make a copy of the quantum state, and it really is not allowed to do so. What will happen (as I will show you), is that it will respond by displaying to you a random value, with a probability given by the square of some part of the amplitude of the quantum wavefunction. In other words, I'll show you how Born's rule comes about, quite naturally. In a world where no wavefunction collapses, of course. 

Monday, July 14, 2014

On quantum measurement (Part 2: Some history, and John von Neumann is confused)

This is Part 2 of the "On quantum measurement" series. Part 1: (Hans Bethe, the oracle) is here.

Before we begin in earnest, I should warn you, (or ease your mind, whichever is your preference): this sequence has math in it. I'm not in it to dazzle you with math. It's just that I know no other way to convey my thoughts about quantum measurement in a more succinct manner. Math, you see, is a way for those of us who are not quite bright enough, to hold on to thoughts which, without math, would be too daunting to formulate, too ambitious to pursue. Math is for poor thinkers, such as myself. If you are one of those too, come join me. The rest of you: why are you still reading? Oh, you're not. OK. 

Hey, come back: this historical interlude turns out to be math-free after all. But I promise math in Part 3.

Before I offer to you my take on the issue of quantum measurement, we should spend some time reminiscing, about the history of the quantum measurement "problem". If you've read my posts (and why else would you read this one?), you'll know one thing about me: when the literature says there is a "problem", I get interested. 

This particular problem isn't that old. It arose through a discussion between Niels Bohr and Albert Einstein, who disagreed vehemently about measurement, and the nature of reality itself.  

Bohr and Einstein at Ehrenfest's house, in 1925. Source: Wikimedia

The "war" between Bohr and Eintein only broke out in 1935 (via dueling papers in the Physical Review), but the discussion had been brewing for 10 years at least. 

Much has been written about the controversy (and a good summary albeit with a philosophical bent can be found in the Stanford Encyclopedia of Philosophy). Instead of going into that much detail, I'll just simplify it by saying:

Bohr believed the result of a measurement reflects a real objective quantity (the value of the property being measured).

Einstein believed that quantum systems have objective properties independent of their measurements, and that becuase quantum mechanics cannot properly describe them, the theory must necessarily be incomplete.

In my view, both views are wrong. Bohr's because his argument relies on a quantum wavefunction that collapses upon measurement (which as I'll show you is nonsense), and Einstein's because the idea that a quantum system has objective properties (described by one of the eigenstates of a measurement device) is wrong and that, as a consequence the notion that quantum mechanics must be incomplete is wrong as well. He was right, though, about the fact that quantum systems have properties independently of whether you measure them or not. It is just that we may not ever know what these properties are.

But enough of the preliminaries. I will begin to couch quantum measurement in terms of a formalism due to John von Neumann. If you think I'm obsessed by the guy because he seems to make an appearance in every second blog post of mine: don't blame me. He just ended up doing some very fundamental things in a number of different areas. So I'm sparing you the obligatory picture of his, because I assume you have seen his likeness enough. 

John von Neumann's seminal book on quantum mechanics is called "Mathematische Grundlagen der Quantenmechanik" (Mathematical foundations of quantum theory), and appeared in 1932, three years before the testy exchange of papers (1) between Bohr and Einstein. 

My copy of the "Grundlagen". This is the version issued by the U.S. Alien Property Custodian from 1943 by Dover Publications. It is the verbatim German book, issued in the US in war time. The original copyright is by J. Springer, 1932.

In this book, von Neumann made a model of the measurement process that had two stages, aptly called "first stage" and "second stage". [I want to note here that JvN actually called the first stage "Process 2" and the second stage "Process 1", which today would be confusing so I reversed it.]

The first stage is unitary, which means "probability conserving". JvN uses the word "causal" for this kind of dynamics. In today's language, we call that process an "entanglement operation" (I'll describe it in more details momentarily, which means "wait for Part 3"). Probability conservation is certainly a requisite for a causal process, and I actually like JvN's use of the word "causal". That word now seems to have acquired a somewhat different meaning

The second stage is the mysterious one. It is (according to JvN) acausal, because it involves the collapse of the wavefunction (or as Hans Bethe called it, the "reduction of the wavepacket"). It is clear that this stage is mysterious to Johnny, because he doesn't know where the collapse occurs. He is following "type one" processes in a typical measurement (in the book, he measures temperature as an example) from the thermal expansion of the mercury fluid column, to the light quanta that scatter off the mercury column and enter our eye, where the light is refracted in the lense and forms an image on the retina, which then stimulate nerves in the visual cortex, and ultimately creates the "subjective experience" of the measurement. 

According to JvN, the bounday between what is the quantum system and what is the measurement device can be moved in an arbitrary fashion. He understands perfectly that a division into a system to be measured and a measuring system is necessary and crucial (and we'll spend considerable time discussing this), but the undeniable fact—that it is not at all clear where to draw the boundary— is a mystery to him. He invokes the philosophical principle of "psychophysical parallelism"—which states that there can be no causal interaction between the mind and the body— to explain why the boundary is so fluid. But it is the sentence just following this assertion that puts the finger on what is puzzling him. He writes: 

"Because experience only ever makes statements like this: 'an observer has had a (subjective) perception', but never one like this: 'a physical quantity has taken on a particular value'."(2)

This is, excuse my referee's voice, very muddled. He says: We never have the experience "X takes on x", we always experience "X looks like it is in state x". But mathematically they should be the same. He makes a distinction that does not exist. We will see later why he feels he must make that distinction. But, in short, it is because he thinks that what we perceive must also be reality. If a physical object X is perceived to take on state x, then this must mean that objectively "X takes on x". In other words, he assumes that subjective experience must mirror objective fact.

Yet, this is provably dead wrong. 

That is what Nicolas and I discovered in the article in question, and that is undoubtedly what Hans Bethe immediately realized, but struggled to put into words. 

Quantum reality, in other words, is a whole different thing than classical reality. In fact, in the "worst case" (to be made precise as we go along) they may have nothing to do with each other, as Nicolas and I  argue in a completely obscure (that is unknown) article entitled "What Information Theory Can Tell us About Quantum Reality" (3).

What you will discover when following this series of posts, is that if your measurement device claims "the quantum spin that you were measuring was in state up", then this may not actually tell you anything about the true quantum state. The way I put it colloquially is that "measurement devices tend to lie to you". They lie, because they give you an answer that is provably nonsense. 

In their (the device's) defense, they have no choice but to lie to you (I will make that statement precise when we do math). They lie because they are incapable of telling the truth. Because the truth is, in a precise information-theoretic way that I'll let you in on, bigger than they are. 

JvN tried to reconcile subjective experience with objective truth. Subjectively, the quantum state collapsed from a myriad of possibilities to a single truth. But in fact, nothing of the sort happens. Your subjective experience is not reflecting an objective truth. The truth is out there, but it won't show itselves in our apparatus. The beauty of theoretical physics is that we can find out about how the wool is being pulled over our eyes—how classical measurement devices are conspiring to deceive us—when our senses would never allow us a glimpse of the underlying truth.

Math supporting all that talk will start in Part 3. 

(1) Einstein (with Podolsky and Rosen) wrote a paper entitled "Can Quantum-Mechanical Description of Physical Reality Be Considered Complete?". It appeared in Phys. Rev. 47 (1935) 777-780. Four pages: nowadays it would be a PRL. I highly recommend reading it. Bohr was (according to historical records and the narrative in Zurek's great book about it all) incensed. Bohr reacted by writing a paper with the same exact title as Einstein's, that has (in my opinion) nothing in it. It is an astonishing paper because it is content-free, but was meant to serve as a statement that Bohr refutes Einstein, when in fact Bohr had nothing. 

(2) Denn die Erfahrung macht nur Aussagen von diesem Typus: ein Beobachter hat eine bestimmte (subjektive) Wahrnehmung gemacht, und nie eine solche: eine physikalische Größe hat einen bestimmten Wert. 

(3) C. Adami & N.J. Cerf, Lect. Notes in Comp. Sci. 1509 (1999) 258-268

Sunday, June 22, 2014

On quantum measurement (Part 1: Hans Bethe, the oracle)

For this series of posts, I'm going to take you on a ride through the bewildering jungle that is quantum measurement. I've no idea how many parts will be enough, but I'm fairly sure there will be more than one. After all, the quantum mechanics of measurement has been that subject's "mystery of mysteries" for ages, it now seems. 

Before we begin, I should tell you how I became interested in the quantum measurement problem. Because for the longest time I wasn't. During graduate school (at the University of Bonn), the usual thing happened: the Prof (in my case Prof. Werner Sandhas, who I hope turned eighty this past April 14th) says that they'll tell us about quantum measurement towards the end of the semester, and never actually get there. I have developed a sneaking suspicion that this happened a lot, in quantum mechanics classes everywhere, every time. Which would explain a lot of the confusion that still reigns. 

However, to tell you how I became interested in this problem is a little difficult, because I risk embarrassing myself. The embarrassment that I'm risking is not the usual type. It is because the story that I will tell you will seem utterly ridiculous, outrageously presumptuous, and altogether improbable. But it occurred just as I will attempt to tell it. There is one witness to this story, my collaborator in this particular endeavor, the Belgian theoretical physicist Nicolas Cerf.  

Now, because Nicolas and I worked together very closely on a number of different topics in quantum information theory when we shared an office at Caltech, you might surmise that he would corroborate any story I write (and thus not be an independent witness). I'm sure he remembers the story (wait for it, I know I'm teasing) differently, but you would have to ask him. All I can say is that this is how I remember it.

Nicolas and I had begun to work in quantum information theory around 1995-1996. After a while we were studying the quantum communication protocols of quantum teleportation and quantum superdense coding, and in our minds (that is, our manner of counting), information did not add up. But, we thought, information must be conserved. We were certain. (Obviously that has been an obsession of mine for a while, those of you who have read my black hole stories will think to yourselves).
Space-time diagrams for the quantum teleportation process (a) and superdense coding process (b). EPR stand for an entangled Einstein-Podolsky-Rosen pair. Note the information values for the various classical and quantum bits in red. Adapted from Ref. [1]. The letters 'M' and 'U' stand for a measurement and a unitary opration, respectively. A and B are the comunication partners 'Alice' and 'Bob'.

But information cannot be conserved, we realized, unless you can have negative bits. Negative entropy: anti-qubits (see the illustration above). This discovery of ours is by now fairly well-known (so well-known, in fact, that sometimes articles about negative quantum entropy don't seem to feel it necessary to refer to our original paper at all). But it is only the beginning of the story (ludicrous as it may well appear to you) that I want to tell. 

After Nicolas and I wrote the negative entropy paper, we realized that quantum measurement was, after all, reversible. That fact was obvious once you understood these quantum communication protocols, but it was even more obvious once you understood the quantum erasure experiment. Well, for all we knew, this was flying in the face of accepted lore, which (ever since Niels Bohr) would maintain that quantum measurement required an irreversible collapse of the quantum wavefunction. Ordinarily, I would now put up a picture of the Danish physicist who championed wave function collapse, but I cannot bring myself to do it: I have come to loathe the man. I'm sure I'm being petty here.

With this breakthrough discovery in mind ("Quantum measurement is reversible!") Nicolas and I went to see Hans Bethe, who was visiting Caltech at the time. At this point, Hans and I had become good friends, as he visited Caltech regularly. I wrote up my recollections of my first three weeks with him (and also our last meeting) in the volume commemorating his life. (If you don't want to buy that book but read the story, try this link. But you should really buy the book: there's other fun stuff in it). The picture below is from Wikipedia, but that is not how I remember him. I first met him when he was 85. 
         Hans A. Bethe (1906-2005) (Source: Wikimedia

Alright, enough of the preliminaries. Nicolas Cerf and I decided to ask for Hans's advice, and enter his office, then on the 3rd floor of Caltech's Kellogg Radiation Laboratory. For us, that meant one flight of stairs up. We tell him right away that we think we have discovered something important that is relevant to the physics of quantum measurement, and start explaining our theory. I should tell you that what we have at this point isn't much of a theory: it is the argument, based on negative conditional quantum entropies, that quantum measurement can in principle be reversed. 

Hans listens patiently. Once in a while he asks a question that forces us to be more specific.

After we are done, he speaks.

"I am not that much interested in finding that quantum measurement is reversible. What I find much more interesting is that you have solved the quantum measurement problem."

After that, there is a moment of silence. Both Nicolas and I are utterly stunned. 

I am first to ask the obvious. 
"Can you explain to us why?"

You see, it is fairly improbable that a physicist of the caliber of Hans Bethe tells you that you have solved the "mystery of mysteries". Neither Nicolas nor I had seen this coming from a mile away. And we certainly had no idea why he just said that.

We were waiting with--shall we say--bated breath. Put yourself into our position. How would you have reacted? What came after was also wholly unexpected.

After I asked him to explain that last statement, he was silent for--I don't know--maybe three seconds. In a conversation like this, that is bordering on a perceived eternity.

My recollection is fuzzy at this point. Either he began by saying "I can't explain it to you", or he immediately told the story of the Mathematics Professor who lectures on a complex topic and fills blackboard after blackboard, until a student interrupts him and asks: "Can you explain this last step in your derivation to me?"

The Professor answers: "It is obvious". The student insists. "If it is obvious, can you explain it?", and the Professor answers: "It is obvious, but I'll have to get back to you to explain it tomorrow".

At this point of Hans telling this story, the atmosphere is a little awkward. Hans tell us that it is obvious that we solved the quantum measurement problem, but he can't tell us exactly why he thinks it is obvious that we did. It certainly is not obvious to us.

I know Hans well enough at this point that I press on. I cannot let that statement go just like that. He did go on to try to explain what he meant.  Now of course I wish I had taken notes but I didn't. But what he said resonated in my mind for a long time (and I suspect that this is true for Nicolas as well). After what he said, we both dropped everything we were doing, and worked only on the quantum measurement problem, for six months, culminating in this paper

What he said was something like this: "When you make a measurement, its outcome is conditional on the measurements made on that quantum system before that, and so on, giving rise to a long series of measurements, all conditional on each other".

This is nowhere near an exact rendition of what he said. All I remember is him talking about atomic decay, and measuring the product of the decay and that this is conditional on previous events, and (that is the key thing I remember) that this gives rise to these long arcs of successive measurements whose outcomes are conditional on the past, and condition the future. 

Both Nicolas and I kept trying to revive that conversation in our memory when we worked on the problem for the six months following. (Hans left Caltech that year the day after our conversation). Hans also commented that our finding had deep implications for quantum statistical mechanics, because it showed that the theory is quite different from the classical theory after all. We did some work on the quantum Maxwell Demon in reaction to that, but never really had enough time to finish it. Other people after us did. But for the six months that followed, Nicoals and I worked with only this thought in our mind:

"He said we solved the problem. Let us find out how!"

In the posts that follow this one, I will try to give you an idea of what it is we did discover (most of it contained in the article mentioned above). You will easily out find that this article isn't published (and I'll happily tell you the story how that happened some other time). While a good part of what's in that paper did get published ultimately, I think the main story is still untold. And I am attempting to tell this story still, via a preprint I have about consecutive measurements, that I'm also still working on. But consecutive measurement is what Hans was telling us about in this brief session, that changed the scientific life of both Nicolas and I. He knew what he was talking about, but he didn't know how to tell us just then. It was obvious to him. I hope it will be obvious to me one day too.

Even though the conversation with Hans happened as I described, I should tell you that 18 years after Hans said this to us (and thinking about it and working on it for quite a while) I don't think he was altogether right. We had solved something, but I don't think we solved "the whole thing". There is more to it. Perhaps much more.

Stay tuned for Part 2, where I will explain the very basics of quantum measurement, what von Neumann had to say about it, as well as what this has to do with Everett and the "Many-world" interpretation. And if this all works out as I plan, perhaps I will ultimately get to the point that Hans Bethe certainly did not foresee: that the physics of quantum measurement is intimately linked to Gödel incompleteness. But I'm getting ahead of myself.

[1] N.J. Cerf and C. Adami. Negative entropy and information in quantum mechanics. Phys. Rev. Lett. 79 (1997) 5194-5197.

Note added: upon reading the manuscript again after all this time, I found in the acknowledgements the (I suppose more or less exact) statement that Hans had made. He stated that "negative entropy solves the problem of the reduction of the wave packet". Thus, it appears he did not maintain that we had "solved the measurement problem" as I had written above, only a piece if it. 

Sunday, June 8, 2014

Whose entropy is it anyway? (Part 2: The so-called Second Law)

This is the second part of the "Whose entropy is it anyway?" series. Part 1: "Boltzmann, Shannon, and Gibbs" is here.

Yes, let's talk about that second law in light of the fact we just established, namely that Boltzmann and Shannon entropy are fundamentally describing the same thing: they are measures of uncertainty applied to different realms of inquiry, making us thankful that Johnny vN was smart enough to see this right away. 

The second law is usually written like this: 

"When an isolated system approaches equilibrium from a non-equilibrium state, its entropy almost always increases"

I want to point out here that this is a very curious law, because there is, in fact, no proof for it. Really, there isn't. Not every thermodynamics textbook is honest enough to point this out, but I have been taught this early on, because I learned Thermodynamics from the East-German edition of Landau and Lifshitz's tome "Statistische Physik", which is quite forthcoming about this (in the English translation):

"At the present time, it is not certain whether the law of increase of entropy thus formulated can be derived from classical mechanics"

From that, L&L go on to speculate that the arrow of time may be a consequence of quantum mechanics.

I personally think that quantum mechanics has nothing to do with it (but see further below). The reason the law cannot be derived is because it does not exist. 

I know, I know. Deafening silence. Then:

"What do you mean? Obviously the law exists!"

What I mean, to be more precise, is that strictly speaking Boltzmann's entropy cannot describe what goes on when a system not at equilibrium approaches said equilibrium, because Boltzmann's entropy is an equilibrium concept. It describes the value that is approached when a system equilibrates. It cannot describe its value as it approaches that constant. Yes, Boltzmann's entropy is a constant: it counts how many microstates can be taken on by a system at fixed energy. 

When a system is not at equlibrium, fewer microstates are actually occupied by the system, but the number it could potentially take on is constant. Take, for example, the standard "perfume bottle" experiment that is so often used to illustrate the second law:
An open "perfume bottle" (left) about to release its molecules into the available space (right)

The entropy of the gas inside the bottle is usually described as being small, while the entropy of the gas on the right (because it occupies a large space) is believed to be large. But Boltzmann's formula is actually not applicable to the situation on the left, because it assumes (on account of the equilibrium condition), that the probability distributions in phase space of all particles involved are independent. But they are clearly not, because if I know the location of one of the particles in the bottle, I can make very good predictions about the other particles because they occupy such a confined space. (This is much less true for the particles in the larger space at right, obviously).

What should we do to correct this? 

We need to come up with a formula for entropy that is not explicitly true only at equilibrium, and that allows us to quantify correlations between particles. Thermodynamics cannot do this, because equilibrium thermodynamics is precisely that theory that deals with systems whose correlations have decayed long ago, or as Feynman put it, systems "where all the fast things have happened but the slow things have not". 

Shannon's formula, it turns out, does precisely what we are looking for: quantify correlations between all particles involved. Thus, Shannon's entropy describes, in a sense, nonequilibrium thermodynamics. Let me show you how.

Let's go back to Shannon's formula applied to a single molecule, described by a random variable $A_1$, and call this entropy $H(A_1)$. 

I want to point out right away something that may shock and disorient you, unless you followed the discussion in the post "What is Information? (Part 3: Everything is conditional)" that I mentioned earler. This entropy $H(A_1)$ is actually conditional. This will become important later, so just store this away for the moment. 

OK. Now let's look at a two-atom gas. Our second atom is described by random variable $A_2$, and you can see that we are assuming here that the atoms are distinguishable. I do this only for convenience, everything can be done just as well for indistinguishable particles.

If there are no correlations between the two atoms, then the entropy of the joint system $H(A_1A_2)=H(A_1)+H(A_2)$, that is, entropy is extensive. Thermodynamical entropy is extensive because it describes things at equilibrium. Shannon entropy, on the other hand is not. It can describe things that are not at equilibrium, because then
$$H(A_1A_2)=H(A_1)+H(A_2)-H(A_1:A_2) ,$$
where $H(A_1:A_2)$ is the correlation entropy, or shared entropy, or information, between $A_1$ and $A_2$. It is what allows you to predict something about $A_2$ when you know $A_1$, which is precisely what we already knew we could do in the picture of the molecules crammed into the perfume bottle on the left. This is stunning news for people who only know thermodynamics,

What if we have more particles? Well, we can quantify those correlations too. Say we have three variables, and the third one is (with very little surprise) described by variable $A_3$. It is then a simple exercise to write the joint entropy $H(A_1A_2A_3)$ as
Entropy Venn diagram for three random variables, with the correlation entropries indicated.

We find thus that the entropy of the joint system of variables can be written in terms of the extensive entropy (the sum of the subsystem entropies) minus the correlation entropy $H_{\rm corr}$, which inlcudes correlations between pairs of variables, triplets of variables, and so forth. Indeed, the joint entropy of an $n$-particle system can be written in terms of a sum that features the (extensive) sum of single-particle entropies plus (or minus) the possible many-particle correlation entropies (the sign always alternates between even and odd number of participating particles):
$$H(A_1,...,A_n)=\sum_{i=1}^n H(A_i)-\sum_{i\neq j}H(A_i:A_j)+\sum_{i\neq j\neq k} H(A_i:A_j:A_k)-\cdots. $$
This formula quickly becomes cumbersome, which is why Shannon entropy isn't a very useful formulation of non-equilibrium thermodynamics unless the correlations are somehow confined to just a few variables. 

Now, let's look at what happens when the gas in the bottle escapes into the larger area. Initially, the entropy is small, because the correlation entropy is large. Let's write this entropy as 
where $I$ is the information I have because I know that the molecules are in the bottle. You now see why the entropy is small: you know a lot (in fact, $I$) about the system. The unconditional piece is the entropy of the system when all the fast things (the molecules escaping the bottle) have happened.  

Some of you may have already understood what happens when the bottle is opened: the information $I$ that I have (or any other observer, for that matter, has) decreases. And as a consequence, the conditional entropy $H(A_1,...,A_n|I)$ increases. It does so until $I=0$, and the maximum entropy state is achieved. Thus, what is usually written as the second law is really just the increase of the conditional entropy as information becomes outdated. Information, after all, is that which allows me to make predictions with accuracy better than chance. If the symbols that I have in my hand (and that I use to make the predictions) do not predict anymore, then they are not information anymore: they have turned to entropy. Indeed, in the end this is all the second law is about: how information turns into entropy.

You have probably already noticed that I could now take the vessel on the right of the figure above and open that one up. Then you realize that you did have information after all, namely you knew that the particles were confined to the larger area. This example teaches us that, as I pointed out in "What is Information? (Part I)", the entropy of a system is not a well-defined quantity unless we specify what measurement device we are going to use to measure it with, and as a consequence what the range of values of the measurements are going to be. 

The original second law, being faulty, should therefore be reformulated like this: 

In a thermodynamical equilibrium or non-equilibrium process, the unconditional (joint) entropy of a closed system remains a constant. 

The "true second law", I propose, should read:

When an isolated system approaches equilibrium from a non-equilibrium state, its conditional entropy almost always increases

Well, that looks suspiciously like the old law, except with the word "conditional" in front of "entropy". It seems like an innocuous change, but it took two blog posts to get there, and I hope I have convinced you that this change is not at all trivial. 

Now to close this part, let's return to Gibbs's entropy, which really looks exactly like Shannon's. And indeed, the $p_i$ in Gibbs's formula 
$$S=-\sum_i p_i\log p_i$$
could just as well refer to non-equilibrium distributions. If it does refer to equilibrium, we should use the Boltzmann distribution (I set here Boltzmann's constant to $k=1$, as it really just renormalizes the entropy)
$$p_i=\frac1Z e^{-E_i/T}$$
where $Z=\sum_ie^{-E_i/T}$ is known as the "partition function" in thermodynamics (which just makes sure that the $p_i$ are correctly normalized), and $E_i$ is the energy of the $i$th microstate. Oh yeah, T is the temperature, in case you were wondering.

If we plug this $p_i$ into Gibbs's (or Shannon's) formula, we get 
$$S=\log Z+E/T$$
This is, of course, a well-known thermodynamical relationship because $F=-T\log Z$ is also known as the Helmholtz free energy, so that $F=E-TS$. 

As we have just seen that this classical formula is the limiting case of using the Boltzmann (equilibrium) distribution within Gibbs's (or Shannon's) formula, we can be pretty confident that the relationship between information theory and thermodynamics I just described is sound. 

As a last thought: how did von Neumann know that Shannon's formula was the (non-equilibrium) entropy of thermodynamics? He had been working on quantum statistical mechanics in 1927, and deduced that the quantum entropy should be written in terms of the quantum density matrix $\rho$ as (here "Tr" stands for the matrix trace)
$$S(\rho)=-{\rm Tr} \rho\log \rho.$$
Quantum mechanical density matrices are in general non-diagonal. But were they to become classical, they would approach a diagonal matrix where all the elements on the diagonal are probabilities $p_1,...,p_n$. In that case, we just find
$$S(\rho)\to-\sum_{i=1}^n p_i\log p_i, $$ 
in other words, Shannon's formula is just the classical limit of the quantum entropy that was invented twentyone years before Shannon thought of it, and you can bet that Johnny immediately saw this!

In other words, there is a very good reason why Boltzmann's, Gibbs's, and Shannon's formulas are all called entropy, and Johnny von Neumann didn't make this suggestion to Shannon in jest.

Is this the end of "Whose entropy is it anyway?". Perhaps, but I have a lot more to write about the quantum notion of entropy, and whether considering quantum mechanical measurements can say anything about the arrow of time (as Landau and Lifshitz suggested). Because considering the quantum entropy of the universe can also say something about the evolution of our universe and the nature of the "Big Bang", perhaps a Part 3 will be appropriate. 

Stay tuned!

Saturday, June 7, 2014

Whose entropy is it anyway? (Part 1: Boltzmann, Shannon, and Gibbs )

Note: this post was slated to appear on May 31, 2014, but events outside of my control (such as grant submission deadlines, and parties at my house) delayed its issuance.

The word "entropy" is used a lot, isn't it? OK, not in your average conversation, but it is a staple of conversations between some scientists, but certainly all nerds and geeks. You have read my introduction to information theory I suppose (and if not, go ahead and start here, right away!)  But in my explanations of Shannon's entropy concept, I only obliquely referred to another "entropy": that which came before Shannon: the thermodynamic entropy concept of Boltzmann and Gibbs. The concept was originally discussed by Clausius, but because he did not give a formula, I will just have to ignore him here. 

Why do these seemingly disparate concepts have the same name? How are they related? And what does this tell us about the second law of thermodynamics?

This is the blog post (possibly a series) where I try to throw some light on that relationship. I suspect that what follows below isn't very original (otherwise I probably should have written it up in a paper), but I have to admit that I didn't really check. I did write about some of these issues in an article that was published in a Festschrift on the occasion of the 85th birthday of Gerry Brown, who was my Ph.D. co-advisor and a strong influence on my scientific career. He passed away a year ago to this day, and I have not yet found a way to remember him properly. Perhaps a blog post on the relationship between thermodynamics and information theory is appropriate, as it bridges a subject Gerry taught often (Thermodynamics) with a subject I have come to love: the concept of information. But face it: a book chapter doesn't get a lot of readership. Fortunately, you can read it on arxiv here, and I urge you to because it does talk about Gerry in the introduction.  

Gerry Brown (1926-2013)
Before we get to the relationship between Shannon's entropy and Boltzmann's, how did they end up being called by the same name? After all, one is a concept within the realm of physics, the other from electrical engineering. What gives?

The one to blame for this confluence is none other than John von Neumann, the mathematician, physicist, engineer, computer scientist (perhaps Artificial Life researcher, sometimes moonlighting as an economist). It is difficult to appreciate the genius that was John von Neumann, not the least because there aren't many people who are as broadly trained as he was. For me, the quote that fills me with awe comes from another genius who I've had the priviledge to know well, the physicist Hans Bethe. I should write a blog post about my recollections of our interactions, but there is already a write-up in the book memorializing Hans's life. While I have never asked Hans directly about his impressions of von Neumann (how I wish that I had!), he is quoted as saying (in the 1957 LIFE magazine article commemorating von Neumann's death: "I have sometimes wondered whether a brain like von Neumann's does not indicate a species superior to man".

The reason why I think that this quite a statement, is that I think Bethe's brain was in itself very unrepresentative of our species, and perhaps indicated an altogether different kind.

So, the story goes (as told by Myron Tribus in his 1971 article "Energy and Information") that when Claude Shannon had figured out his channel capacity theorem, he consulted von Neumann (both at Princeton at the time) about what he should call the "-p log p" value of the message to be sent over a channel. von Neumann supposedly replied:

"You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name. In the second place, and more importantly, no one knows what entropy really is, so in a debate you will always have the advantage.”

The quote is also reprinted in the fairly well-known book "Maxwell's Demon: Entropy, Information, and Computing", edited by Leff and Rex. Indeed, von Neumann had defined a quantity just like that as early as 1927 in the context of quantum mechanics (I'll get to that). So he knew exactly what he was talking about.

Let's assume that this is an authentic quote. I can see how it could be authentic, because the thermodynamic concept of entropy (due to the Austrian physicist Ludwig Boltzmann) can be quite, let's say, challenging. I'm perfectly happy to report that I did not understand it for the longest time, in fact not until I understood Shannon's entropy, and perhaps not until I understood quantum entropy.
Ludwig Boltzmann (1844-1906). Source: Wikimedia
Boltzmann defined entropy. In fact, his formula $S= k \log W$ is engraved on top of his tombstone, as shown here:
Google "Boltzmann tombstone" to see the entire marble edifice to Boltzmann
In this formula, $S$ stands for entropy, $k$ is now known as "Boltzmann's constant", and $W$ is the number of states (usually called "microstates" in statistical physics) a system can take on. But it is the $\log W$ that is the true entropy of the system. Entropy is actually a dimensionless quantity in thermodynamics. It takes on the form above (which has the dimensions of the constant $k$) if you fail to convert the energy units of temperature into more manageable units, such as the Kelvin. In fact, $k$ just tells you how to do this translation:
$$k=1.38\times 10^{-23} {\rm J/K},$$
where J (for Joule) is the SI unit for energy. If you define temperature in these units, then entropy is dimensionless
$$S=\log W.   (1)$$
But this doesn't at all look like Shannon's formula, you say? 

You're quite right. We still have a bit of work to do. We haven't yet exploited the fact that $\log W$ is the number of microstates consistent with a macrostate at energy $E$. Let us write down the probability distribution $w(E)$ for the macrostate to be found with energy $E$. We can then see that

I'm sorry, that last derivation was censored. It would have bored the tears out of you. I know because I could barely stand it myself. I can tell you where to look it up in Landau & Lifshitz if you really want to see it.

The final result is this: Eq. (1) can be written as
$$S=-\sum_{E_i} w_i\log w_i   (2)$$
implying that Boltzmann's entropy formula looks to be exactly the same as Shannon's. 

Except, of course, that in the equation above the probabilities $w_i$ are all equal to each other. If some microstates are more likely than others, the entropy becomes simply
$$S=-\sum_{E_i} p_i\log p_i     (3)$$
where the $p_i$ are the different probabilities to occupy the different microstate $i$. 

Equation (3) was derived by the American theoretical physicist Willard Gibbs, who is generally credited for the development of statistical mechanics. 

J. Willard Gibbs (1839-1903) Source: Wikimedia
Now Eq. (3) does precisely look like Shannon's, which you can check by comparing to Eq. (1) in the post "What is Information? (Part 3: Everything is conditional)". Thus, it is Gibbs's entropy that is like Shannon's, not Boltzmann's. But before I discuss this subtlety, ponder this:

At first sight, this similarity between Boltzmann's and Shannon's entropy appears ludicrous. Boltzmann was concerned with the dynamics of gases (and many-particle systems in general). Shannon wanted to understand whether you can communicate accurately over noisy channels. These appear to be completely unrelated endeavors. Except they are not, if you move far enough away from the particulars. Both, in the end, have to do with measurement. 

If you want to communicate over a noisy channel, the difficult part is on the receiving end (even though you quickly find out that in order to be able to receive the message in its pristine form, you also have to do some work at the sender's end). Retrieving a message from a noisy channel requires that you or I make accurate measurements that can distinguish the signal from the noise. 

If you want to characterize the state of a many-particle system, you have to do something other than measure the state of every particle (because that would be impossible). You'll have to develop a theory that allows us to quantify the state given a handful of proxy variables, such as energy, temperature, and pressure. This is, fundamentally, what thermodynamics is all about. But before you can think about what to measure in order to know the state of your system, you have to define what it is you don't know. This is Boltzmann's entropy: how much you don't know about the many-particle system. 

In Shannon's channel, a message is simply a set of symbols that can encode meaning (they can refer to something). But before it has any meaning, it is just a vessel that can carry information. How much information? This is what's given by Shannon's entropy. Thus, the Shannon entropy quantifies how much information you could possibly send across the channel (per use of the channel), that is, entropy is potential information

Of course, Boltzmann entropy is also potential information: If you knew the state of the many-particle system precisely, then the Boltzmann entropy would vanish. You (being an ardent student of thermodynamics) already know what is required to make a thermodynamical entropy vanish: the temperature of the system must be zero. This, incidentally, is the content of the third law of thermodynamics.

"The third law?", I hear some of you exclaim. "What about the second?"

Yes, what about this so-called Second Law?

To be continued, with special emphasis on the Second Law, in Part 2

Saturday, May 10, 2014

The science of citations. (Also: black holes, again).

If the title of this blog post made you click on it, I owe you an apology. All others, that is, those who got here via perfectly legitimate reasons, such as checking this blog with bated breath every morning--only to be disappointed yet again ... bear with me.

Here I'll tell you why I'm writing about the same idea (concerning, what else, the physics of black holes) for the fourth time.  

You see, an eminent scientist (whose name I'm sure I forgot) once told me that if you want to get a new idea noticed among the gatekeepers of that particular field, you have to publish it five times. The same idea, he said, over and over.

I'm sure I listened when he said that, but I'm equally sure that, at the time (youthful as I was), I did not understand. Why would the idea, if it was new (and let's assume for the sake of the present argument) worthy of the attention and study of everyone working in the field, not take hold immediately?

Today I'm much wiser, because now I can understand this advice as being a perfectly commonplace application of the theory of fixation of beneficial alleles in Darwinian evolution. 

Stop shaking your head already. I know you came here to get the latest on the physics of black holes. But before I get to that (and I will), this little detour on how science is made--via the mathematics of population genetics no less--should be worth your time.

In population genetics (the theory that deals with how populations and genes evolve), a mutation (that is, a genetic change) is said to have "gone to fixation" (or "fixated") if everybody in the population carries this mutation. This implies that all those that ever did NOT carry that mutation are extinct at the time of fixation. From the point of view of scientific innovation, fixation is the ultimate triumph of an idea: if you have achieved it, then nobody who ever did NOT believe you is dead as a doornail.

Fixation means you win. What do you have to do to achieve that?

Of course, in reality such "fixation of ideas" is never fully achieved, as in our world there will always be people who doubt that the Earth is round, or that the Earth revolves around the Sun, or that humans are complicit in the Earth's climate change. I will glibly ignore these elements. (As I'm sure you do too). Progress in science is hard enough even in the absence of such irrationality.

When a new idea is born, it is a fragile little thing. It germinates in the mind of its progenitor first tentatively, then more strongly. But even though it may one day be fully formed and forcefully expelled by its creative parent into the world of ideas, the nascent idea faces formidable challenges. It might be better than all other ideas in hindsight. But today is not hindsight: here, now, at the time of its birth, this idea is just an idea among many, and it can be snuffed out at a moment's notice.

The mathematics of population genetics teaches us that much: to calculate the probability of fixation of a mutation, you start by calculating its probability to go extinct. Let's think of a new idea as the mutation of an old idea.

What does it mean for an idea to go extinct? If we stick to ideas published in the standard technical literature, it means for that idea (neatly written into a publication) to never be cited because citations are, in a very real way, the offspring your idea germinates. You remember Mendel, the German-speaking Silesian monk who discerned the mathematical laws of inheritance at a time when Darwin was still entirely in the dark about how inheritance works?
Gregor Mendel (1822-1844) Source: Wikimedia
(Incidentally, my mother was born not far from Mendel's birthplace.)

Well, Mendel's idea almost got extinct. He published his work about the inheritance of traits in 1866, in the journal Verhandlungen des naturforschenden Vereins Brünn. Very likely, this is the most famous paper to ever appear in this journal, and to boot it carried the unfortunate title "Versuche über Pflanzenhybriden" (Experiments on Plant Hybridization). Choosing a bad title can really doom the future chances of your intellectual offspring (I know, I chose a number of such unfortunate titles), and so it was for Mendel. His paper was classified under "hybridization" (rather than "inheritance"), and lay dormant for 35 years. Imagine your work laying dormant for 35 years. Now stop.

The idea of Mendelian genetics was, for all purposes, extinct. But unlike life forms, extinct ideas can be resurrected. As luck would have it, a Dutch botanist by the name of de Vries conducted a series of experiments hybridizing plant species in the 1890s, and came across Mendel's paper. While clearly influenced by that paper (because de Vries changed his nomenclature to match Mendel's), he did not cite him in his 1900 paper in the French journal Comptes Rendus de l'Académie des Sciences. And just like that, Mendel persisted in the state of extinction once more.

Except that this time, the German botanist and geneticist Carl Correns came to Mendel's rescue. Performing experiments on the hawkweed plant, he rediscovered Mendel's laws, and because he knew a botanist that had corresponded with Mendel, he was familiar with that work. Correns went on to chastize de Vries for not acknowledging Mendel, and after de Vries starting citing him, the lineage of Mendel's idea was established forever.

You may think that this is an extraordinary sequence of events but I can assure you, this is by no means an exception. Science does not progress in a linear predictable fashion, mostly because the agents involved (namely scientists) are not exactly predictable machines. They are people. Even in science, the fickle fortunes of chance reign supremely.

But chance can be described mathematically. Can we calculate the chance that a paper will become immortal?

Let us try to estimate the likelihood that a really cool paper (that advances science by a great deal) becomes so well-known that everybody in the field knows it (and therefore cites it, when appropriate). Let's assume that said paper is so good that it is better than other papers by a factor $1+s$, where $s$ is a positive fraction ($s=0.1$ if the paper is 10% better than the state of the art, for example).  Even though this paper is "fitter"--so to speak--then all the other papers it competes with for attention, it is not obvious that it will become known instantly, because whether someone picks up the paper, reads it, understands it, and then writes a new paper citing it, is a stochastic (that is, probabilistic) process. So many things have to go right.

We can model these vagaries by using what is known as a Poisson process, and calculate the probability $p(k)$ that a paper gets $k$ citations
                    $$p(k)=\frac{e^{-(1+s)}(1+s)^k}{k!}.     (1)$$.  
Poisson processes are very common in nature: any time you have rare events, the cumulative distribution (of how many of these rare events ended up occurring) is distributed like (1).

Note that because the mean number of citations of the paper (the average of the distribution), is $1+s$ and thus larger than one, and if we moreover assume that the papers that cite your masterpiece "inherit" the same advantage (because they also use that cool new idea) then in principle the number of citations of this paper should explode exponentially with time. But that is not a guarantee.

Let's calculate the probability $P$ that the paper actually "takes off". In fact, it is easier to calculate the probability that it does NOT take off: that it languishes in obscurity forever. This is $1-P$. (Being summarily ignored is the fate of several of my papers, and indeed almost all papers everywhere).

This (eternal obscurity) can happen if the paper never gets cited in the first place (given by $p(0)$ in Eq. (1) above), plus the probability that it gets cited once (but the citing paper itself never gets cited: $p(1)(1-P)$) plus the probability that it gets cited twice, but BOTH citing papers don't make it:  $p(2)(1-P)^2$, and so on.

Of course, the terms with higher $k$ become more and more unlikely (because when you get 10 citations, say, it is quite unlikely that all ten of them will never be cited). We can calculate the total "extinction probability" as an infinite sum:
                              $$1-P=\sum_{k=0}^{\infty} p(k)(1-P)^k,       (2)$$
but keep in mind that the sum is really dominated by its first few terms. Now we can plug formula (1) into (2), do a little dance, and you get the simple looking formula
                                                 $$1-P=e^{-(1+s)P} .         (3)$$          
That looks great, because you might think that we can now easily calculate $P$ from $s$. But not so fast.

You see, that Eq. (3) is a recursive formula. It calculates $P$ in terms of $P$. You are of course not surprised by that, because we explicitly assumed that the likelihood that the paper never becomes well known depends on the probability that the few papers that cite it are ignored too. Your idea becomes extinct if all those you ever influenced are reduced to silence, too.

While the formula cannot be solved analytically, you can calculate an approximate form by assuming that the paper is great but not world shaking. Then, the probability $P$ that it will become well-known is small enough that we can approximate the exponential function in (3) as
                                      $$e^{-(1+s)P}\approx 1-(1+s)P+\frac12(1+s)^2P^2,$$
that is, we keep terms up to quadratic in $P$. And the reason why $P$ is small is because the advantage $s$ isn't fantastically large either. So we'll ignore everything that has powers of $s^2$ or higher in $s$.

Then Eq. (3) becomes
                    $$1-P\approx 1-P-sP+\frac12 P^2 +sP^2, $$ or
                               $$P=\frac{2s}{1+2s}.         (4)$$
What we derived here is actually a celebrated result from population genetics, due to the mathematical biologist J. B. S. Haldane.
J. B. S. Haldane (1892-1964)
If you're not already familiar with this chap, spend a little time with him. (Like, reading about him on Wikipedia). He would have had a lot to complain about in this century, I'm thinking.

So what does this result teach us? If you take equation (4) and expand it to lowest order in $s$ (this means, you neglect all terms of order $s^2$), you get the result that is usually associated with Haldane, namely that the probability of fixation of an allele with fitness advantage $s$ is
                                                         $$P(s)=2s.        (5)$$.
Here's what this means for us: if an idea is ten percent better than any other idea in the field, there is only a 20% chance that it will get accepted (via the chancy process of citation).

These are pretty poor odds for an obviously superior idea! What to make of that?

The solution to this conundrum is precisely the advice that the senior scientist imparted on me!

Try, try, again. 

I will leave it to you, dear reader, to calculate the odds of achieving "fixation" if you start the process five independent times. (Bonus points for a formula that calculates the probability of fixation for $n$ independent attempts at fixation).

So now we return to the first question. Why am I writing a fourth paper on the communication capacity of black holes (with a fifth firmly in preparation, I might mention)? 

Because the odds that a single paper will spread a good idea are simply too small. Three is good. Four is better. Five is much better. 

And in case you are trying to keep track, here is the litany of tries, with links when appropriate:

(1) C. Adami and G. ver Steeg. Class. Quantum Gravity 31 (2014) 075015  blogpost
(2) C. Adami and G. ver Steeg. quant-ph/0601065 (2006) [in review]  blogpost
(3) K. Brádler and C. Adami, JHEP 1405 (2014) 095 (also preprint arXiv:1310.7914) blogpost
(4) K. Brádler and C. Adami, arXiv preprint arXiv:1405.1097. Read below. 

The last paper, entitled "Black holes as bosonic Gaussian channels" was written with my colleague Kamil Bradler (just as the paper that was the object of my most watched blog post to-date). In there we show, again, what happens to quantum information interacting with a black hole, only that in paper (3) we could make definitive statements only about two very extreme black hole channels: those that perfectly reflect information, and those that perfectly absorb. While we cannot calculate all capacities in the latest paper (4), we can do this now as accurately as current theory allows, using the physics of "Gaussian states", rather than the "qubits" that we used previously. Gaussian states are quantum superpositions of particles that are such that if you would measure them, you get a distribution (of the number of measured particles) that is Gaussian. Think of them was wavepackets, if you wish. They are very convenient in quantum optics, for example. And they turn out to be very convenient to describe black hole channels too.

So we are getting the message out  one more time. And the message is loud and clear: Black holes don't mess with information. The Universe is fine. 

Nothing to see here. Move along.

Friday, April 4, 2014

The Quantum Cloning Wars Revisited

Cloning isn't so much in the news anymore these days, as the novelty of Dolly has worn off and cloning of farm animals and pets has become common place. But a different form of cloning--the cloning of quantum information, that is--is still very much discussed. 
Dolly the cloned sheep. (Pining for the fjords)
There are a number of fundamental rules that we abide by in quantum physics. That probabilities are given by the square of the wave function's amplitude, for example (Born's rule). Or, that quantum interference is destroyed if you try to know what's really goin' on (the lesson of the double-slit experiment).

Or, that quantum physics is linear.

It really is. What this means is that if you have quantum wavefunctions $\psi$ and $\phi$ and an operator $U$ acting on the sum of them, then you get

$U(\psi+\phi)=U\psi +U\phi$.

Pretty innocuous-looking, right? But the consequences of this little harmless statement are mighty. This linearity implies that quantum states cannot be cloned. They can't be Dolly. They can't be Xeroxed. Thou shalt not clone quantum states. How is linearity going to legislate that?

Well, let's first state what cloning of a quantum state would look like. Could we clone a state such as $\phi$? 

(I don't have to tell you what $\phi$ is to answer the question, as you will understand in a minute-and-a-half). 

If you know what $\phi$ is, then yes, you can clone this state! (Wait for the apparent contradiction to be resolved before firing off your email).

Cloning in quantum mechanics means that if you start with a state $\phi$, then after cloning you have the state $\phi\phi$: two copies of $\phi$. It is actually possible to design an operator $U$ that does precisely this:

$U\phi 0=\phi\phi$

(What happens here is that the second state "0" is turned into $\phi$ by $U$. $U$ literally measures $\phi$, and then uses this knowledge of what $\phi$ is to turn 0 into a copy of that $\phi$. So you can clone any known state.

"What do you mean by 'known'?"

Well you see, for $U\phi$ to be $\phi$ (and so that we could turn 0 into $\phi$), you must already have known the basis that $\phi$ was prepared in. Because otherwise, the measurement would have changed $\phi$. 

But here comes the rub. Imagine your state is $\phi+\psi$. In general, these are not in the same basis. Let's try to clone that one:

When we apply this operator $U$ to $\phi +\psi$, look what happens because of linearity:

$U(\phi+\psi)=U\phi+U\psi=\phi\phi+\psi\psi$    (1). 

Neato. But that is not the cloning of $\phi+\psi$. That would have been $(\phi+\psi)(\phi+\psi)$. That's not the same as Eq. (1). 

So it is possible to clone any one particular (known) state, but it is not possible to clone superpositions of states (so-called "non-orthogonal states"). So in general, cloning is impossible. That's the no-cloning theorem, due to Bill Wootters and Wojciech Zurek [1], as well as Dennis Dieks [2], who discovered the theorem independently in 1982. 

It turns out that if you could clone quantum states, all hell would break loose in the universe. This is because the quantum mechanics of entanglement is really quite powerful: Einstein called entanglement "spooky action-at-a-distance". Two entangled quantum states can seemingly "communicate" over large distances. Distances so large that if they were communicating, then this would have to occur at a speed larger than light. And you can see how Einstein would seriously object to such a proposition. Strenuously. He would knock you over the head, is what he would do. 

I say "seemingly", because even though entities A and B (often dubbed "Alice" and "Bob") that share an entangled state would obtain the same exact measurement results if they proceeded to measure the shared state (even if they were in different galaxies), it turns out that these measurements cannot be used for communication. Their measurement devices show the same result, but because this result is a random number, no information can be sent. Nada. Sleep well, Albert. 

Unless, that is, Alice or Bob (or both) could make copies of their entangled state. If they could do that, well then they could use those measurements to communicate superluminally, as was discovered by the Science Fiction writer Nick Herbert, as it turns out. (It was this conjecture that ultimately led to the formulation of the no-cloning theorem.) So, all hell would break loose if quantum cloning would be allowed, because then you could communicate superluminally, which means you could travel backwards in time, kill a great-grandparent and leave the universe in the mother of all time-paradoxes.

So quantum cloning is out, and all because of the linearity of quantum mechanics, and all that's a good thing.

Or is it?

The reason I'm asking is.... black holes, of course. You may have heard about the commotion I caused by announcing that black holes do not destroy information, because that information is copied just before it disappears in the black hole's abyss. Copied, I hasten to add, by this process called "stimulated emission" that must accompany absorption, as this ubiquitous gentleman named Albert E. told us about in 1917. But if I send into the black hole a quantum state $\phi$ rather than the classical "001010010001", wouldn't the black hole then violate the sacrosanct no-cloning theorem? The one that if you violate it should make the fabric of space time melt?

That is a question worth investigating in detail, which I have done in an article I wrote in 2006, and which is currently under review. Yes, it is another one of those.

Here's what we discovered (my collaborator in this was the same same Greg Ver Steeg that collaborated with me in the 2004 article--published ten years later--and whose blog I'm linking here):

Black holes are quantum cloning machines.

How is this possible? Well, it is possible because black holes aren't perfect. Cloners, that is. It turns out that you are allowed to clone quantum states somewhat. You can do it as long as you are sloppy. That imperfect cloning is possible was discovered by Buzek and Hillery in 1996 [3]. They showed that you can design quantum cloning machines that take a quantum state $\phi$ and transform it--not into another quantum state--but into a density matrix $\rho$ that is fairly close to $\psi$. How close you ask? This is measured by the fidelity of the cloning machine $F$, which is just the expectation value of the density matrix $\rho$ within the initial quantum state $\phi$:

$F=\langle \phi|\rho|\phi\rangle$    (1)

$F=1$ means you created a perfect cloning machine where $\rho=| \phi \rangle \langle \phi|$. If you were able to do this, then of course you already know what $\phi$ was (a "known" state, one that you had previously measured). And perfect cloning of known states is allowed, because in that case you are cloning classical, rather than quantum information. You're standing next to the copier feeding sheets to the machnie. That's right: the difference between classical and quantum information is simply whether or not you know what kind of a state you have on your hands!

Come to think of it, that's really the difference between classical and quantum mechanics right there, in a nutshell. Forget all this $\hbar\to 0$ nonsense, that is not what makes quantum different from classical. It is whether you deal with orthogonal states or not. And if your quantum states are not orthogonal, then quantumness is how non-orthogonal your state is with respect to your measurement basis.

How well can you quantum-clone then? What's the highest achievable fidelity? This question was answered by Gisin and Massar a year after quantum cloning machines were invented [6]. They found that the optimal fidelity for a machine that tries to make two copies from a single unknown quantum state is $F=5/6$. That's not bad, right? Actually, they got an even more impressive result: they showed that if you had $N$ identically prepared quantum states (prepared by someone other than you, because you remember that you are not to know what kind of quantum state you are trying to clone), and you want to create $M$ copies of this states ($M$ sort-of-copies, that is), then the best you can do with an optimal universal cloning machine is

$F=\frac{M(N+1)+N}{M(N+2)}$    (2)

The "universal" here means that this cloning machine will achieve the fidelity no matter what the initial state is. There are cloning machines that can do better for some states, and worse for others. These are called "state-dependent" cloning machines.

Now, back to black holes. What kind of cloning machines are they? What's their fidelity? This, it turns out, depends on how reflective the black hole is.

"Reflective? Aren't black holes supposed to be black, ergo non-reflective?"

Actually, not necessarily. Black holes can reflect stuff, in particular if you hurl something at the black hole somewhat at an angle (that is, not straight on, as in the figure below)

Black holes are actually surrounded by a potential barrier, which we can choose to model by a semi-transparent mirror surrounding the black hole, with reflectivity $1-\alpha$. If you read the post about the quantum capacity of black holes, then you remember this reflectivity.

So, let's first consider black holes that perfectly reflect radiation. I called them "white holes" earlier, even though this is not exactly how white holes are defined in the literature. But I really don't care, because I believe that there is a fundamental relationship between black holes and white holes that involves time-reversal, or else flipping the inside and the outside of the holes. Keep on reading if that intrigues you.

So, nothing can enter a white hole, while stuff from inside the white hole makes it out unhindered. But this white hole is very different from a mirror, because besides the reflection, it also stimulates the emission of radiation in reponse to the stuff that is reflected. And these stimulated states are the clones of the incoming states. This means of course that after you send in the quantum state $\phi$, the black hole returns two almost-clones (one from the reflection, and one from stimulated emission). And guess what: the fidelity of these clones is $F=5/6$. The white hole is an optimal universal quantum cloning machine!

Now that I impressed you with this statement, let me quickly make it even more impressive. It turns out that the black hole isn't just a $1\to 2$ cloner. Because the stimulated emission of radiation produces an arbitrary number of "copies", the white hole is a $1\to M$ cloner if you need it to be. In fact, it will be an $N\to M$ cloner if you want. And it will perform this feat with the optimal fidelity of Gisin and Massar. That's Equation (2) above.

Now, if you're not duly impressed, this is perhaps because you think that white holes aren't that interesting. But you should keep in mind that in Hawking's original formulation, black holes did not absorb anything either, as paradoxical as that sounds. Also, mirrors are not universal quantum cloners. You need the stimulated emission effect to make this possible.

Now, let us look beyond the horizon. Even though the white hole perfectly reflects the quantum states you fling at it, there is actually stuff beyond the horizon. Indeed, as described in the quantum capacity blog post, there are anti-clones behind the horizon.

"Anti-clones? Are you just all-out kidding me now?"

"Anti-clone" is a good word. But it really is a thing. Anti-clones are the stimulated "twin" of the clone outside the black hole. They must be there, because you can't just stimulate a copy without violating a bunch of conservation laws, like particle number, momentum, and whatever else characterizes the thing you send in. You've got to do this in twins: particle-anti-particle, clone and anti-clone.

"What is the fidelity of the anti-clone? Is it 5/6 also?"

The answer is no. That would be bad as I'll discuss. That anti-clone has a fidelity of 2/3.  Strange you think? Not after I'll tell you what this number represents. In fact, the fidelity of the anti-clones behind the horizon (any number $M$ of them, in fact), given that somebody sent in $N$ identically prepared copies (that were all reflected, of course), is independent of $M$ and given by

$F=\frac{N+1}{N+2}$ .   (3)

So, that's as good as you can hope to reconstruct an arbitrary quantum state using the anti-clones behind the horizon.

That's actually a very interesting result, because it happens to be Pierre Simon Laplace's "rule of succession" that he derived in the 18th century. Not in the context of black holes, mind you, but in the context of our sun.
Pierre-Simon Laplace (1745-1827)
Source: Wikimedia

This is the probability that the sun will rise tomorrow given that you have observed it to rise $N$ previous times. It assumes a "prior" that posits that both the sun rising as well as not rising are possible outcomes of your "experiment", and these are added to the $N$ total observations. Thus, out of $N+2$ events, $N+1$ have the sun rising. This correspondence is surprising, because the fidelity (3) is in fact the probability to correctly estimate the state of a quantum two-state system (while the random variable "sun rising" is classical) using $N$ classical measurements only, as was shown by Massar and Popescu in 1995 [5]. Somebody should investigate this curious coincidence, that in my view is not a coincidence at all.

All right then, to sum it up: For a white hole, the clone fidelity is the optimal 5/6, and the anti-clone fidelity (behind the curtain) is 2/3, which is the best quantum state reconstruction you can do with classical means (like, measurements).

What about perfectly absorbing black holes? I'll make that short and sweet given all that we learned. The fidelity of clones outside is 2/3, while the fidelity of the anti-clones inside is 5/6.

"It's just the opposite from the white hole situation! It is as if the inside and the outside of the black hole had been flipped!"

I told you so, didn't I. Yes, sitting inside of a black hole (if your turbine engines allow you to sit), you despair that none of your signals make it outside. Your signals are turned back towards you, as if they were reflected. As if you were looking at a white hole horizon.

To wrap this meandering post up, Greg and I did calculate the cloning fidelity (for the clones outside the horizon) for arbitrary reflectivity. It turns out that this fidelity is very close to the optimal one as long as the black hole is not too tiny (see the figure below).
Cloning fidelity as a function of the number of clones produced, for moderately-sized black holes, and different black hole absorptivities. 
So, black holes are almost optimal quantum cloners. Who knew?

Actually, this possibility was discussed briefly as early as 1990, according to Lenny Susskind in his book "The Black Hole War". Indeed, Susskind writes that he proposed (in front of Sid Coleman and Stephen Hawking) that the problem would be solved if "the region just outside the horizon is occupied by a lot of tiny invisible Xerox machines" [6, p. 227]. But he then immediately retreated from this idea, because he thought it would violate the no-cloning theorem. Which we now know it does not.

Susskind later revived the idea in his "black hole complementarity" proposal, claiming that somehow information would both fall into the black hole and be reflected at the horizon, but that the no-cloning theorem would not be violated because nobody would ever know (as you can't make an experiment both inside and outside of the black hole). This idea is, as I'm sure you can now see as clear as daylight, based on a profound misunderstanding of quantum cloning, and in particular its relation to stimulated emission of radiation.

Finally, given that Pierre-Simon Laplace occupies such an interesting place in this post, I ought to note in passsing that he is the one who invented (discovered?) the special function now called "Spherical Harmonics", which plays such a fundamental role in quantum physics (as part of the wavefunction of the hydrogen atom).  Welcome home, Laplace!

The subject matter discussed in this blog post is in Ref. [7], and currently under review. I will update this post with the exact reference once the paper has appeared.


[1] W. K. Wootters and W. H. Zurek, A single quantum cannot be cloned. Nature 299, 802 (1982) 
[2] D. Dieks, Communication by EPR devices. Phys. Lett. A 92, 271 (1982). 
[3] V. Buzek and M. Hillery, Quantum copying: Beyond the no-cloning theorem. Phys. Rev. A 54, 1844 (1996) 
[4] N. Gisin and S. Massar, Opimal quantum cloning ma- chines. Phys. Rev. Lett. 79, 2153–2156 (1997). 
[5] S. Massar and S. Popescu,  Optimal extraction of information from finite quantum ensembles. Phys. Rev. Lett. 74, 259–1263 (1995).
[6] L. Susskind, The Black Hole War. Back Bay Books, 2008.
[7] C. Adami and G. Ver Steeg, Black holes are almost optimal quantum cloners, quant-ph/0601065 (2006)