Tuesday, October 7, 2014

Nifty papers I wrote that nobody knows about (Part 4: Complex Langevin equation)

This is the last installment of the "Nifty Papers" series. Here are the links to Part1, Part2, and Part 3.

For those outside the computational physics community, the following words don't mean anything: 


For those others that have encountered the problem, these words elicit terror. They stand for sleepless nights. They spell despair. They make grown men and women weep helplessly. The Sign Problem. 

OK, I get it, you're not one of those. So let me help you out.

In computational physics, one of the main tools people use to calculate complicated quantities is the Monte Carlo method. The method relies on random sampling of distributions in order to obtain accurate estimates of means. In the lab where I was a postdoc from 1992-1995, the Monte Carlo methods was used predominantly to calculate the properties of nuclei, using a shell model approach. 

I can't get into the specifics of the Monte Carlo method in this post, not the least because such an exposition would loose me a good fraction of what viewers/readers I have left at this point. Basically, it is a numerical method to calculate integrals (even though it can be used for other things too). It involves sampling the integrand and summing the terms. If the integrand is strongly oscillating (lots of high positives and high negatives), then the integral may be slow to converge. Such integrals appear in particular when calculating expectation values in strongly interacting systems, such as for example big nuclei. And yes, the group I had joined as a postdoc at that point in my career specialized in calculating properties of large nuclei computationally using the nuclear shell model. These folks would battle the sign problem on a daily basis.

And while as a Fairchild Prize Fellow (at the time it was called the "Division Prize Fellowship", because Fairchild did not at that time want their name attached) I could work on anything I wanted (and I did!), I also wanted to do something that would make the life of these folks a little easier. I decided to try to tackle the sign problem. I started work on this problem in the beginning of 1993 (the first page of my notes reproduced below is dated February 9th, 1993, shortly after I arrived).

The last calculation, pages 720-727 of my notes, is dated August 27th, 1999, so I clearly took my good time with this project! Actually, it lay dormant for about four years as I worked on digital life and quantum information theory. But my notes were so detailed that I could pick the project back up in 1999.


The idea to use the complex Langevin equation to calculate "difficult" integrals is not mine, and not new (the literature on this topic goes back to 1985, see the review by Gausterer [1]). I actually had the idea without knowing these papers, but this is neither here nor there. I was the first to apply the method to the many-fermion problem, where I also was able to show that the complex Langevin (CL) averages converge reliably. Indeed, the CL method was, when I began working on it, largely abandoned because people did not trust those averages. But enough of the preliminaries. Let's jump into the mathematics.

Take a look at the following integral:

$$\frac1{\sqrt{2\pi}}\int_{-\infty}^\infty d\sigma e^{-(1/2)\sigma^2}\cos(\sigma z).$$
This integral looks very much like the Gaussian integral
$$\frac1{\sqrt{2\pi}}\int_{-\infty}^\infty d\sigma e^{-(1/2)\sigma^2}=1,$$
except for that cosine function. The exact result for the integral with the cosine function is (trust me there, but of course you can work it out yourself if you feel like it) 
$$e^{-(1/2)z^2}.$$
This result might surprise you, as the integrand itself (on account of the cos function) oscillates a lot:
The integrand \(\cos(10x)e^{-1/2 x^2}\)
The factor \(e^{-(1/2) \sigma^2}\) dampens these oscillations, and in the end the result is simple: It is as if the cosine function wasn't even there, and just replaces \(\sigma\) by \(z\).  But a Monte Carlo evaluation of this integral runs into the sign problem when \(z\) gets large and the oscillations become more and more violent. The numerical average converges very very slowly, which means that your computer has to run for a very long time to get a good estimate.

Now imagine calculating an expectation value where this problem occurs both in the numerator and the denominator. In that case, we have to deal with small but weakly converging averages both in the numerator and denominator, and the ratio converges even more slowly. For example, imagine calculating the "mean square"
The denominator of this ratio (for \(N=1\)) is the integral we looked at above. The numerator just has an extra \(\sigma^2\) in it. The \(N\) ("particle number") is there to just make things worse if you choose a large one, just as in nuclear physics larger nuclei are harder to calculate. I show you below the result of calculating this expectation value using the Monte Carlo approach (data with error bars), along with the analytical exact result (solid line), and as inset the average "sign" \(\Phi\) of the calculation. The sign here is just the expectation value of
$$\Phi(z)=\frac{\cos(\sigma z)}{|\cos(\sigma z)|}$$

You see that for increasing \(z\), the Monte Carlo average becomes very noisy, and the average sign disappears. For a \(z\) larger than three, this calculation is quite hopeless: sign 1, Monte Carlo 0.

I want to make one thing clear here: of course you would not use the Monte Carlo method to calculate this integral if you can do it "by hand" (as you can for the example I show here). I'm using this integral as a test case, because the exact result is easy to get. The gist is: if you can solve this integral computationally, maybe you can solve those integrals for which you don't know the answer analytically in the same manner. And then you solve the sign problem. So what other methods are there?

The solution I proposed was using the complex Langevin equation. Before moving to the complex version (and why), let's look at using the real Langevin equation to calculate averages. The idea here is the following. When you calculate an integral using the Monte Carlo approach, what you are really doing is summing over a set of points that are chosen such that you reject (probabilistically) those that are not close to the integrand--and you accept those that are close, again probabilistically, which creates a sequence of random samples that approximates the probability distribution that you want to integrate. 

But there are other methods to create sequences that appear to be drawn from a given probability distribution. One is the Langevin equation which I'm going to explain. Another is the Fokker-Planck equation, which is related to the Langevin equation but that I'm not going to explain. 

Here's the theory (not due to me, of course), on how you use the Langevin equation to calculate averages. Say you want to calculate the expectation value of a function \(O(\sigma)\). To do that, you need to average \(O(\sigma)\), which means you sum (and by that I mean integrate), this function over the probability that you find \(\sigma\). The idea here is that \(\sigma\) is controlled by a physical process: \(\sigma\) does not change randomly, but according to some laws of physics. You want to know the average \(O\), which depends on \(\sigma\), given that \(\sigma\) changes according to some natural process.

If you think about it long enough, you realize that many many things in physics boil down to calculating averages just like that. Say, the pressure at room temperature given that the molecules are moving according to the known laws of physics. Right, almost everything in physics, then. So you see, being able to do this is important. Most of the time, Monte Carlo will serve you just fine. We are dealing with all the other cases here. 

First, we need to make sure we capture the fact that the variable $\sigma$ changes according to some physical law. When you are first exposed to classical mechanics, you learn that the time development of any variable is described by a Lagrangian function (and then you move on to the Hamiltonian so that you are prepared to deal with quantum mechanics, but we won't go there here). The integral of the Lagrangian is called the "action" \(S\), and that is the function that is used to quantify how likely any variable \(\sigma\) is given that it follows these laws. For example, if you are a particle following the laws of gravity, then I can write down for you the Lagrangian (and hence the action) that makes sure the particles follow the law. It is \(L=-\frac12m v^2+mV(\sigma)\), where \(m\) is the mass, and \(v\) is the velocity of the \(\sigma\) variable, \(v=d\sigma/dt\),  and \(V(\sigma)\) is the gravitational potential.

The action is \(S=\int dt L(\sigma(t)) dt\), and the equilibrium distribution of \(\sigma\) is
$$P(\sigma)=\frac1Z e^{-S}$$ where $Z$ is the partition function \(Z=\int e^{-S}d\sigma\).

In computational physics, what you want is a process that creates this equilibrium distribution, because if you have it, then you can just sum over the variables so created and you have your integral. Monte Carlo is one method to create that distribution. We are looking for another. 

It turns out that the Langevin equation
$$\frac{d\sigma}{dt}=-\frac12 \frac{dS}{d\sigma}+\eta(t)$$
creates precisely such a process. Here, \(S\) is the action for the process, and \(\eta(t)\) is a noise term with zero mean and unit variance:
$$\langle \eta(t)\eta(t^{\prime})\rangle=\delta(t-t^\prime).$$
Note that \(t\) here is a "fictitious" time: we use it only to create a set of $\sigma$s that are distributed according to the probability distribution \(P(\sigma)\) above. If we have this fictitious time series \(\sigma_0\) (the solution to the differential equation above), then we can just average the observable \(O(\sigma)\):
$$\langle O\rangle=\lim_{T\to\infty}\frac1T\int_0^TO(\sigma_0(t))dt$$
Let's try the "Langevin approach" to calculating averages on the example integral \(\langle \sigma^2\rangle_N\) above. The action we have to use is
$$S=\frac12 \sigma^2-N\ln [\cos(\sigma z)]$$ so that \(e^{-S}\) gives exactly the integrand we are looking for. Remember, all expectation values are calculated as
$$\langle O\rangle=\frac{\int O(\sigma) e^{-S(\sigma)}d\sigma}{\int e^{-S(\sigma)}d\sigma}.$$

With that action, the Langevin equation is
$$\dot \sigma=-\frac12(\sigma+Nz\tan(\sigma z))+\eta \ \ \ \      (1)$$
This update rule creates a sequence of $\sigma$ that can be used to calculate the integral in question.

And the result is ..... a catastrophe! 

The average does not converge, mainly because in the differential equation (1), I ignored a drift term that goes like \(\pm i\delta(\cos(z\sigma))\). That it's there is not entirely trivial, but if you sit with that equation a little while you'll realize that weird stuff will happen if the cosine is zero. That term throws the trajectory all over the place once in a while, giving rise to an average that simply will not converge.

In the end, this is the sign problem raising its ugly head again. You do one thing, you do another, and it comes back to haunt you. Is there no escape?

You've been reading patiently so far, so you must have suspected that there is an escape. There is indeed, and I'll show it to you now.

This simple integral that we are trying to calculate
$$\frac1{\sqrt{2\pi}}\int_{-\infty}^\infty d\sigma e^{-(1/2)\sigma^2}\cos(\sigma z),$$
we could really write it also as
$$\frac1{\sqrt{2\pi}}\int_{-\infty}^\infty d\sigma e^{-(1/2)\sigma^2}e^{iz},$$
because the latter integral really has no imaginary part. Because the integral is symmetric. 

This is the part that you have to understand to appreciate this article. And as a consequence this blog post.  If you did, skip the next part. It is only there for those people that are still scratching their head.

OK: here's what you learn in school: \(e^{iz}=\cos(z)+i\sin(z)\). This formula is so famous, it even has its own name. It is called Euler's formula. And \(cos(z)\) is a symmetric function (it remains the same if you change \(z\to-z\)), while \(\sin(z)\) is anti-symmetric (\(\sin(-z)=-\sin(z)\)). An integral from \(-\infty\) to \(\infty\) will render any asymmetric function zero: only the symmetric parts remain. Therefore, \(\int_{-\infty}^\infty e^{iz}= \int_{-\infty}^\infty \cos(z)\). 

This is the one flash of brilliance in the entire paper: that you can replace a cos by a complex exponential if you are dealing with symmetric integrals. Because this changes everything for the Langevin equation (it doesn't do that much for the Monte Carlo approach). The rest was showing that this worked also for more complicated shell models of nuclei, rather than the trivial integral I showed you. Well, you also have to figure out how to replace oscillating functions that are not just a cosine, (that is, how to extend arbitrary negative actions into the complex plane) but in the end, it turns out that this can be done if necessary.

But let's first see how this changes the Langevin equation. 

Let's first look at the case \(N=1\). The action for the Langevin equation was 
$$S=\frac12\sigma^2-\log\cos(\sigma z)$$
If you replace the cos, the action instead becomes
$$S=\frac12\sigma^2\pm i\sigma z .$$ The fixed point of the differential equation (1), which was on the real line and therefore could hit the singularity \(\delta(\cos(z\sigma))\), has now moved into the complex plane. 

And in the complex plane there are no singularities! Because they are all on the real line! As a consequence, the averages based on the complex action should converge! The sign problem can be vanquished just by moving to the complex Langevin equation!

And that explains the title of the paper. Sort of. In the figure below, I show you how the complex Langevin equation fares in calculating that integral that, scrolling up all the way, gave rise to such bad error bars when using the Monte Carlo approach. And the triangles in that plot show the result of using a real Langevin equation. That's the catastrophe I mentioned: not only is the result wrong. It doesn't even have large error bars, so it is wrong with conviction! 

The squares (and the solid line) come from using the extended (complex) action in the Langevin equation. It reproduces the exact result precisely.


Average calculated with the real Langevin equation (triangles) and the complex Langevin equation (squares), as a function of the variable \(z\). The inset shows the "sign" of the integral, which still vanishes at large \(z\) even as the complex Langevin equation remains accurate.
The rest of the paper is somewhat anti-climactic. First I show that the same trick works in a quantum-mechanical toy model of rotating nuclei (as opposed to the trivial example integral). I offer the plot below from the paper as proof:
Solid line is exact theory, symbols are my numerical estimates. You've got to hand it to me: Complex Langevin rules.

But if you want to convince the nuclear physicists, you have to do a little bit more than solve a quantum mechanical toy model. Short of solving the entire beast of the nuclear shell model, I decided to tackle something in between: the Lipkin model (sometimes called the Lipkin-Meshkov-Glick model), which is a schematic nuclear shell model that is able to describe collective effects in nuclei. And the advantage is that exact analytic solutions to the model exist, which I can use to compare my numerical estimates to.

The math for this model is far more complicated and I spare you the exposition for the sake of sanity here. (Mine, not yours). A lot of path integrals to calculate. The only thing I want to say here is that in this more realistic model, the complex plane is not entirely free of singularities: there are in fact an infinity of them. But they naturally lie in the complex plane, so a random trajectory will avoid them almost all of the time, whereas you are guaranteed to run into them if they are on the real line and the dynamics return you to the real line without fail. That is, in a nutshell, the discovery of this paper. 

So, this is obviously not a well-known contribution. This is a bit of a bummer, because the sign problem still very much exists, in particular in lattice gauge theory calculations of matter at finite chemical potential (meaning, at finite density). Indeed, a paper came out just recently (see the arXiv link in case you ended up behind a paywall) where the authors try to circumvent the sign problem in lattice QCD at finite density by doing the calculations explicitly at high temperature using the old trick of doubling your degrees of freedom. Incidentally, this is the same trick that gives you black holes at Hawking temperature, because the event horizon naturally doubles degrees of freedom. I used this trick a lot when calculating Feynman diagrams in QCD at finite temperature. But that's a fairly well-known paper, so I can't discuss it here. 

Well, maybe some brave soul one day rediscovers this work, and  writes a "big code" that solves the problem once and for all using this trick. I think the biggest reason why this paper never got any attention is that I don't write big code. I couldn't apply this to a real-world problem, because to do that you need mad software engineering skills. And I don't have those, as anybody who knows me will be happy to tell you. 

So there this work lingers. Undiscovered. Lonely. Unappreciated. Like sooo many other papers by sooo many other researchers over time. If only there was a way that old papers like that could get a second chance! If only :-)

[1] H. Gausterer, Complex Langevin: A numerical Method? Nuclear Physics A 642 (1998) 239c-250c.
[2] C. Adami and S.E. Koonin, Complex Langevin equation and the many-fermion problem. Physical Review C 63 (2001) 034319. 













Friday, October 3, 2014

Nifty papers I wrote that nobody knows about: (Part 3: Non-equilibrium Quantum Statistical Mechanics)

This is the third part of the "Nifty Papers" series. Link to Part 1. Link to Part 2.

In 1999, I was in the middle of writing about quantum information theory with my colleague Nicolas Cerf. We had discovered that quantum conditional entropy can be negative, discussed this finding with respect to the problem of quantum measurement, separability, Bell inequalities, as well as the capacity of quantum channels. Heady stuff, you might think. But we were still haunted by Hans Bethe's statement to us that the discovery of negative conditional entropies would change the way we perceive quantum statistical physics. We had an opportunity to write an invited article for a special issue on Quantum Computation in the journal "Chaos, Solitons, and Fractals", and so we decided to take a shot at the "Quantum Statistical Mechanics" angle.

Because I'm writing this blog post in the series of "Articles I wrote that nobody knows about", you already know that this didn't work out as planned. 

Maybe this was in part because of the title. Here it is, in all its ingloriousness:
C. Adami & N.J. Cerf, Chaos Sol. Fract. 10 (1999) 1637-1650
There are many things that, in my view, conspired to this paper being summarily ignored. The paper has two citations for what it's worth, and one is a self-citation! 

There is, of course, the reputation of the journal to blame. While this was a special issue that put together papers that were presented at a conference (and which were altogether quite good), the journal itself was terrible as it was being ruled autocratically by its editor Mohammed El Naschie, who penned and published a sizable fraction of the papers appearing in the journal (several hundred, in fact). A cursory look at any of these papers shows him to be an incurable (but certainly self-assured) crackpot, and he was ultimately fired from his position by the publisher, Elsevier. He's probably going to try to sue me just for writing this, but I'm trusting MSU has adequate legal protection for my views. 

There is, also, the fairly poor choice of a title. "Prolegomena?" Since nobody ever heard of this article, I never found anyone who would, after a round of beers, poke me in the sides and exclaim "Oh you prankster, choosing this title in hommage to the one by Tom Banks!" Because there is indeed a paper by Tom Banks (a string theorist) entitled: "Prolegomena to a theory of bifurcating universes: A nonlocal solution to the cosmological constant problem or little lambda goes back to the future".  Seriously, it's a real paper, take a look:


For a reason that I can't easily reconstruct, at the time I thought this was a really cool paper. In hindsight it probably wasn't, but it certainly has been cited a LOT more often than my own Prolegomena. That word, by the way, is a very innocent greek word meaning: "An introduction at the start of a book". So I meant to say: "This is not a theory, it is the introduction to something that I would hope could one day become a theory". 

There is also the undeniable fact that I violated the consistency of singular/plural usage, as "a" is singular, and "Statistical Mechanics" is technically plural, even though it is never used in the singular.

Maybe this constitutes three strikes already. Need I go on?

The paper begins with a discussion of the second law of thermodynamics, and my smattering of faithful readers has read my opinions about this topic before. My thoughts on the matter were born around that time, and this is indeed the first time that these arguments were put in print. It even has the "perfume bottle" picture that also appears in the aforementioned blog post.

Now, the arguments outlined in this paper concerning the second law are entirely classical (not quantum), but I used them to introduce the quantum information-theoretic considerations that followed, because the main point was that for the second law, it is a conditional entropy that increases. And it is precisely the conditional entropy that is peculiar in quantum mechanics, because it can be negative. So in the paper I'm writing about I first review that fact, and then show that the negativity of conditional quantum entropy has interesting consequences for measurements on Bell states. The two figures of the Venn diagrams of same-spin as opposed to orthogonal-spin measurements is reproduced here:

What these quantum Venn diagrams show is that the choice of measurement to make on a fully entangled quantum state \(Q_1Q_2\) will determine the relative state of the measurement devices (perfect correlation in the case of same-direction spin measurements, zero correlation in the case of orthogonal measurements), but the quantum reality is that the measurement devices are even more strongly entangled with the quantum system in the case of the orthogonal measurement, even though they are not correlated at all with each other. Which goes to show you that quantum and physical reality can be two entirely different things altogether.

I assure you these results are profound, and because this paper is essentially unknown, you might even try to make a name for yourself! By, umm, citing this one? (I didn't encourage you to plagiarize, obviously!)

So what comes after that? After that come the Prolegomena of using quantum information theory to solve the black hole information paradox!

This is indeed the first time that any of my thoughts on black hole quantum thermodynamics appear in print. And if you compare what's in this paper with the later papers that appeared first in preprint form in 2004, and finally in print in 2014, the formalism in this manuscript seems fairly distinct from these calculations.

But if you look closer, you will see that the basic idea was already present there.

The way I approach the problem is clearly rooted in quantum information theory. For example, people often start by saying "Suppose a black hole forms from a pure state". But what this really means is that the joint state between the matter and radiation forming the black hole, as well as the radiation that is being produced at the same time (which does not ultimately become the black hole) is in a pure state. So you have to describe the pure state in terms of a quantum Venn diagram, and it would look like this:
Entropy Venn diagram between the forming black hole ("proto-black hole" PBH) and a radiation field R. The black hole will ultimately have entropy \(\Sigma\), the entanglement entropy.
Including this radiation field R entangled with the forming black hole is precisely the idea of stimulated emission of radiation that ultimately would solve all the black hole information paradoxes: it was clear to me that you could not form a black hole without leaving an entangled signature behind. I didn't know at the time that R was stimulated emission, but I knew something had to be there. 

Once the black hole is formed, it evaporates by the process of Hawking radiation. During evaporation, the black hole becomes entangled with the radiation field R' via the system R:
Entropy Venn diagram between radiation-of-formation R, the black hole BH, and the Hawking radiation R'. Note that the entropy of the black hole \(S_{\rm BH}\) is smaller than the entropy-of-formation \(\Sigma\) by \(\Delta S\), the entropy of the Hawking radiation. 
The quantum entropy diagram of three systems is characterized by three (and exactly three) variables, and the above diagram was our best bet at this diagram. Note how the entire system has zero entropy and is highly entangled, but when tracing out the radiation-of-formation, the black hole is completely uncorrelated with the Hawking radiation as it should be. 

Now keep in mind, this diagram was drawn up without any calculation whatsoever. And as such, it is prone to be dismissed as a speculation, and it was without doubt a speculation at the time. Five years later I had a calculation, but its acceptance would have to wait for a while.

In hindsight, I'm still proud of this paper. In part because I was bold enough to pronounce the death of the second law as we know it in print, and in part because it documents my first feeble attempts to make sense of the black hole information morass. This was before I had made any calculations in curved space quantum field theory, and my ruminations can therefore easily be dismissed as naive. They were naive (for sure), but not necessarily stupid.

Next week, be prepared for the last installment of the "Nifty Papers" series. The one where I single-handedly take on the bugaboo of computational physics: the "Sign Problem". That paper has my postdoctoral advisor Steve Koonin as a co-author, and he did provide encouragement and helped edit the manuscript. But by and large, this was my first single-author publication in theoretical/computational physics. And the crickets are still chirping....