Monday, January 9, 2017

Are quanta particles or waves?

The title of this post is an age-old question isn't it? Particle or wave? Wave or particle? Many have rightly argued that the so-called "wave-particle duality" is at the very heart of quantum weirdness, and hence, of all of quantum mechanics. Einstein said it. Bohr said it. Feynman said it. Two out of those three are physics heroes of mine, so that's a majority right there.

Feynman, when talking about what we now call the wave-particle duality, was referring to the famous "double-slit experiment". He wrote (in his famous Feynman Lectures, Chapter 37 of Volume 1, to be precise):
Richard Feynman (1918-1988)
Source: Wikimedia
"We choose to examine a phenomenon which is impossible, absolutely impossible, to explain in any classical way, and which has in it the heart of quantum mechanics. In reality, it contains the only mystery. We cannot make the mystery go away by “explaining” how it works. We will just tell you how it works. In telling you how it works we will have told you about the basic peculiarities of all quantum mechanics."
So what is Feynman talking about here? Instead of launching on a lengthy exposition of the double-slit experiment, as luck would have it I've already done that, in a blog post about the quantum eraser. That post, incidentally, was No. 6 in the "Quantum measurement" series that starts here. You don't necessarily have to have read all those posts to follow this one, but believe me, it would help a lot. At the minimum, start at No. 6 if you're not already familiar with the double-slit experiment. But you'll get a succinct introduction to the double-slit experiment below anyway.

Alright, back to quantum mechanics. Actually, step back a little bit more, to classical mechanics. In classical physics, there is no duality between waves and particles. Waves are waves, and they would never behave like particles. For example, you can't kick a wave, really, no matter what the surfer types tell you. Particles on the other hand, do not interfere with each other as waves do. You can kick particles (kinda), and you can count them. You can't count waves. 

What Bohr, Einstein, and Feynman are trying to tell you is that in quantum mechanics (meaning the real world, because as I have told you before, classical mechanics is an illusion, it does not exist) the same stuff can be either particle OR wave. Not both, mind you. Here's what Einstein said about this, and to tell you the truth, this statement sounds like he's been hanging out with Bohr far too much:
A. Einstein (1879-1955)
Source: Wikimedia

t seems as though we must use sometimes the one theory and sometimes the other, while at times we may use either. We are faced with a new kind of difficulty. We have two contradictory pictures of reality; separately neither of them fully explains the phenomena of light, but together they do".
I've used a picture of Einstein in 1904 here, because you've seen far too many pics of him sticking out his tongue and hair disheveled. He wasn't like that most of the time when he made his most important contributions.

Lest you think that the troubles these 20th century physicists had with quantum mechanics is the stuff of history, think again. In 2012, a mere 5 years ago, experimenters from Germany (in the lab of the very eminent Wolfgang Schleich) claimed that they had collected evidence that a quantum system can be both particle and wave at the same time. Such an observation-if true-would run afoul of Bohr's "duality principle", which declared that a quantum system can only be one or the other, depending on the type of experiment used to examine the system. One or the other, but never both

Rest assured though, analyzing results of the Schleich experiment in a different way reveals that all is well with complementarity after all, as was pointed out by a team at the University of Ottawa, led by the equally eminent Robert Boyd. (You can read an excellent summary of that controversy in Tom Siegfried's piece here.) What all this fighting about duality should teach you is that this is not at all a solved problem. As recently as a few days ago, Steven Weinberg (who, full disclosure, has also been in my pantheon of physicists ever after I read his "First Three Minutes" at a very tender age) wrote about the particle-wave duality in the New York Review of Books. I hope that he reads this post, because it may alleviate some of his troubles.  

In this piece, entitled "The Trouble with Quantum Mechanics", Weinberg admits to being as puzzled as his predecessors Einstein, Bohr, and Feynman, about the true nature of quantum physics. How can we understand, he muses, that quantum dynamics is governed by a deterministic equation (the Schrödinger equation), yet when we try to measure something, then all we can muster is probabilities? "So we still have to ask", Weinberg writes, "how do probabilities get into quantum mechanics?"

How indeed. You know of course, from reading my diatribes, that this is a question I am interested in myself. I have obliquely hinted that I think I know where the probabilities are coming from (if you can find the relevant post) and that one day I'll write a detailed account of that idea (it's 3/4 written already, actually). But today is not that day. Having convinced you that the particle-wave duality is still a very hot topic in quantum physics, let me take on that particular subject first. 

What I want to do in this blog post is to make you think differently about the complementarity principle. What I'm going to tell you is that you should stop thinking in terms of "particle or wave". It is a false dichotomy. It is a false dilemma because quantum systems are neither particle nor wave. Those two are classical concepts, after all. Strictly speaking, quantum systems are quantum fields. But this is not the time to delve into quantum field theory, so instead I will try to marshal the tools of quantum information theory to tell you what is really complementary in quantum measurement, what it is that you can have "only one of", and what it is that is being "traded-off". You don't exchange a bit of particle for a bit of wave, this much I can tell you right here. 

To do this, I have to introduce you to some very counter-intuitive quantum stuff. Now, you might argue: "All quantum stuff is counter-intuitive", and I'd have to agree with you if all your intuition is classical. What I am going to tell you is stuff that even baffles seasoned quantum physicists. I'm going to tell you about quantum experiments where the "nature" of the quantum experiment that you perform can be changed after you've already completed the experiment!

Let me remind you right here, that the--also very eminent--Niels Bohr tried to teach us that whether a quantum system appears as a particle or as a wave depends on the type of experiment you subject it to. Here I'm telling you that this is a bunch of hogwash, because I'll show you that when you do an experiment, you can change whether it is a "particle"- or a "wave"-experiment long after the data have been collected!

I know you're not shocked at my dissing Bohr as I have a habit of doing so. But I'm in good company, by the way, if you read what Feynman wrote about Bohr in his "Surely You're Joking" series. 

"Alright I bite", one of you readers exclaimed just now, "how do you retroactively change the type of experiment you make?" 

Glad you asked. Because now I can talk about John Archibald Wheeler. Wheeler was not a conventional physicist: Even though his early career as a nuclear physicist led to several important contributions to the Manhattan project, he was also interested in many other areas of physics. Indeed, he was a central figure in the "revival" of general relativity theory. (That theory had gone a bit out of fashion when people realized that many predictions of the theory were difficult to measure.) Wheeler co-authored what many (including myself) think is the best book on the topic: "Gravitation" (with Charles Misner and Kip Thorne). That book is often just referred to as "MTW".

John Archibald Wheeler (1911-2008).
Source: University of Texas
I never got to meet Wheeler, perhaps because I entered the field of quantum gravity too late. While Wheeler has been influential in the field of quantum information, it really was his gravity work that had the most lasting impact. He invented the terms "black hole" and "wormhole", after all. His most influential contribution to quantum information science is, undoubtedly, the "delayed choice" gedankenexperiment. Let me explain that to you. 

Wheeler's thought experiment examines the question of whether a photon, say, takes on wave or particle nature before it interacts with the experiment, sensing (in a way) what kind of experiment is going to be performed on it. In the simplest version of the delayed choice experiment, the nature of the experiment would be changed "after the photon had made up its mind" whether it was going to play the role of particle, or whether it would make an appearance as a wave. Needless to say, this is of course not how quantum mechanics works, and Wheeler was fully aware of it. His interpretation was that a photon is neither wave nor particle, and that it takes on one of the two "coats" only when it is being observed. I'm going to tell you that I agree with the first part (the photon is neither wave nor particle), but I disagree with the second part: it does not in fact take on either particle or wave nature after it is observed. It never ever takes on such a role.

If you think about it, the idea that a system only "comes into being by being observed" is preposterous (however, such a thought was quite in line with some other of Wheeler's philosophies). Measurements are interactions with other systems just as much as any other interactions are: there is nothing special about measurement. This is, in essence, what I'm going to try to convince you of. 

Even though the reasoning behind the delayed-choice experiment is preposterous, it has generated an enormous amount of work. Let's first look at how we may set up such an experiment. Below is an illustration of a double-slit experiment from Feynman's famous lecture, where he replaced photons by electrons shot out of an electron gun (such devices are perfectly reasonable and feasible). Note that Caltech, where Feynman spent the majority of his career, has made these lectures freely available. The particular chapter can be accessed here

Fig. 1: An interference experiment with electrons. (Source: Feynman Lectures on Physics)
Later on, we're going to be using photons instead of electrons for the quantum system, because experiments are much easier with photon beams as opposed to electron beams.  In that case, we are going to assume that any light is going to be so faint that it can't be thought of as the classical light waves that give rise to Young's interference fringes. Then, at any point in time, there will be at most one photon between the double-slit and the detector, so you have to think about single photons either taking one or the other, or both paths, through the double-slit experiment. 

Quantum mechanics predicts that a single electron takes both paths to create the interference pattern in the figure above at (c). Thus, it must somehow interfere with itself, which is difficult to imagine if you think of the electron as a particle. (Which of course it is not). Can we force it to behave as a particle? Suppose you put a particle detector between the wall and the backstop: one behind slit 1, and one behind slit 2. If you get a "hit" on either detector, then you know which path the electron travelled. (You can do this experiment without actually removing the electron, so that you can still get patterns on the screen.) When you obtain this "which-path" information, the interference pattern disappears: you've forced the electron to behave as a particle. 

Wheeler's idea was this: Suppose the distance between the wall and the backstop is very, very large. If you do not put the contraption that will measure which path the electron took (the "which-path detector") into the experiment, the electron would have no choice but to go along both paths, ready to interfere with itself and create the interference pattern on the screen. But suppose you bring in the "which-path" apparatus after the electron has passed the slit, but before it is going to hit the screen. Is the electron wave function that is on the "other path" going to "change its mind", or go backwards? What would happen? The thought experiment very nicely illustrates how preposterous the idea is that the experiment itself determines "what the quantum system is", as changing the experiment mid-flight cannot possibly change the nature of the electron.

The experiment I'm going to describe to you (the delayed-choice quantum eraser experiment) has in fact been carried out several times now, and drives Wheeler's idea to the extreme. The choice of experiment (insert the "which-path" detector or not), can be made after the electron has hit the screen! If you are a reader for whom this is immediately obvious, then congratulations (and consider a career in quantum physics, if this is not already your career). It is indeed completely obvious if you understand quantum mechanics, but let me walk you through it anyway. 

First, if it was the experiment that determines the nature of the quantum system (particle or wave), how can you change the experiment after it already has occurred? That this is possible is also due to the peculiarities of quantum physics, and it is also the hardest to explain. I'll do it with photons rather than electrons, as this is the experiment that was carried out, and it is also the description I used in the paper that I'm really writing about. You knew this was coming, didn't you?

We can do double-slit experiments with photons just as with electrons: we just have to turn down the intensity of light such that individual photons can be registered on a phosphorescent screen. When you see the screen light up at a particular spot (or, in more modern times, a pixel on a CCD detector lights up), you interpret it that a photon has hit there. Often, the double-slit is replaced by a Mach-Zehnder interferometer, but you shouldn't worry about such technicalities: you can in fact use either. 

To pull off this feat of changing the experiment after the fact, you have to create an entangled pair of photons first. You already know what an entangled pair (a "Bell-state") is, because I wrote about it several times: for example in the context of black holes here, and in the context of quantum teleportation and superdense coding here. This pair of photons is also sometimes called an Einstein-Podolsky-Rosen (EPR) pair, because that trio first described a similar entangled state in a very famous paper in 1935. 

Let's create such a pair by entangling the "polarization" degree of freedom of the photon. This is the part that is a bit more complicated: to understand it, you have to understand polarization. 

Every photon can come in two different polarization states, but what these states are depends on how you decide to measure them. This will be crucial, because this is in fact how you change the measurement after the fact. The thing to know about an entangled pair is that it is in a superposition of those two states. Suppose we use as basis for the photon polarization the "horizontal/vertical" basis. That means that if a photon is polarized horizontally, and you put a filter in front of it that only allows vertical polarization to go through, then out comes nothing. Polarization is, if you will, a photon's way of wiggling. Below is a picture which shows the photon wiggling in the "vertical" and in the horizontal way. But they can also wiggle in the "circular-left" and "circular-right" way. In fact, it can wiggle in an infinite number of "opposing ways", and these are related to each other by a unitary transformation. 

Fig. 2: One way of depicting photon polarization. 
The way a photon is polarized can be changed by an optical element (a "wave plate"), and this ability will be key in the experiment. Suppose we begin with a pair of photons A and B in a Bell-state, written in terms of the horizontal $|h\rangle$ and vertical $|v\rangle$ polarization eigenstates:

$|\Psi\rangle_{AB}=\frac1{\sqrt2}(|h\rangle_A|v\rangle_B+|v\rangle_A|h\rangle_B)$          (1)    

You notice that neither of the photons has a defined state, but if I measure one of them (say A) and find that my detector says it is in an $|h\rangle$ state, then I can be sure that measuring B will give you "v", no matter whether you do the measurement now, or a year later with a detector placed a light year away. This is precisely what Einstein could not stomach, calling this mysterious bond "spooky action at a distance", but a careful analysis reveals that there is no "action" at all: signals cannot be sent using this bond. 

But here's the thing: I can measure photon B either in the $h,v$ coordinate system, or in another one. This will become crucial, so keep this in mind. But for the moment let's forget that a "copy" of photon A (the entangled partner) is flying out there, possibly to a measurement device a light-year away. Actually, there is nothing a light year away from us, so let's say we are far in the future and the detector is on Proxima Centauri, about 4 and a quarter light years away. It'll just be a longer experiment. 

Photon A now goes through a double-slit, just as the electrons in Figure 1. Now we'll do the "are you a particle or a wave" measurement. We do this by putting so-called "quarter-wave plates" in the path of the photons. When you do this, you entangle the polarization of the photon with the spatial degree of freedom (namely "left slit" or "right slit"). Once you've done this, you only have to measure the polarization of photon A to know whether it went through the left or right slit. In a way, you've tagged the photon's path by the polarization. After doing this, you will lose the interference pattern. You can either have an interference pattern (and we say that the photon wavefunction is "coherent"), or you can have "which-path" information, which makes the wavefunction incoherent. Or so people thought for a long time. It turns out that you can also have a a little bit of both, but you can't have both full which-path information, and full coherence: there is a tradeoff. And that tradeoff depends on the angle by which you rotate the polarization basis. In the description above, we used "quarter-wave" plates, which give you full information, and zero coherence. Choose something other than 45 degrees (that's the quarter wave), and you can get a little bit of both. 

It turns out that there is a simple relationship that quantifies this tradeoff in terms of the angle you choose to do the tagging with. Let's call this angle $\phi$. We can then define the "distinguishability" $D$ and the "visibility" $V$, where $D^2$ measures how well you can distinguish the photon paths (a measure of which-path information), while $V^2$ quantifies the visibility of the interference fringes (a measure of the coherence of the wavefunction). A celebrated inequality (due to Greenberger and Yasin [1]) states that
$D^2+V^2\leq1$     (2)

Now, according to what I just wrote, choosing the angle of the wave plate when performing the which-path entangling operation chooses the experiment for you: Set it at 0 degree and you do not entangle at all, so that no which-path information is obtained (then $D^2=0$ and $V^2=1$). Set it at $\phi=\pi/4$, and you get perfect which-path information, and no visibility. How can you choose the experiment after the fact, when you have to choose the angle when setting up the experiment? How?

So the following is what makes quantum mechanics so beautiful. You can actually do this because when I described the experiment to you, I did not (it turns out) use an entangled EPR pair as the input, I used a photon in a defined polarization state, such as $|h\rangle$. I did not tell you about this because it would have confused you. I needed you to understand how to extract which-path information first, and how doing it gradually will gradually destroy coherence. 

Now take a deep breath, and read very slowly.

If the input to the two-slits (and therefore to the "which-path" detector that entangles polarization and path) is the EPR state Eq. (1), you actually do not get any which-path information using the quarter-wave plate. This is because when the photon "comes in", it is not in a defined polarization state. If it was not in a defined state, you extract nothing. So for that setup, $V^2=1$ even though $\phi=\pi/4$. 

Now one more deep breath after you digested this bit. Maybe take two, just to be safe. 

Whether the state that comes in to the two slits is indeed Eq. (1) is up to the person at Proxima Centauri, a year after that data was recorded on the CCD screen on Earth.  This is because of what is $|h\rangle$ and what is $|v\rangle$ is determined by how you measure it. A quantum system does not have a state until you say how you measure it. It will be in the $h,v$ basis if that is the basis of your measurement device. It will be in the $R,L$  (right-circular, left-circular) basis, if that is instead what you will choose to examine it with. Or it could be anything in between.

I wrote about this at length in the blog post about the collapse of the wavefunction, within the "On quantum measurement" series. (Rightfully, the present post really should be "On quantum measurement. Part 8, but I decided to make it stand alone). Please go back to that if the two breaths did not help. There is also an intriguing parallel to how Shannon entropy is not defined until you determine how you will be measuring it, as I wrote about in "What is Information-Part 1".  The deeper reason for this is that all of physics is about the relative state of measurement devices. Mark my words. 

The reason our person at Proxima Centauri handling photon B actually prepares the state is because photon A is not "projected" at any point of the experiment. This could be done, of course, but that is a different experiment. So now we can see how the delayed-choice experiment works: If Proxima Centauri person (PCP, for short) measures at an angle $\theta=0$ with respect to the preparation Eq. (2), then the photon is in a defined state (no matter whether the outcome is $h$ or $v$) and only then do you actually extract which-path information. In that case, visibility $V^2=0$. If PCP measures at $\theta=\pi/4$ on the other hand, the entanglement operation (the "tagging") does not work: it is as if the measurement by PCP "erased" the tagging, and $V^2=1$ instead. So indeed, a measurement far in the future (well, here more than four years in the future) will determine what kind of an experiment is done on the photon. The event far in the future will determine whether the photon appeared as a particle, or a wave. Weird, right? 

What is that you ask? How can an event far in the future affect the data that are stored on a device far in the past? 

I didn't say it did, did I? Of course it does not. The truth is much more magical. Without going into all the details here (but which you can read about in any paper about the Bell-state quantum eraser, or indeed my own paper referenced below), the result of the measurement by PCP in the future contains crucial information about how to decode the data in the past, information that is akin to the key in a cryptography procedure. 

Yes, cryptographic. That is indeed what I wrote. You will only be able to decipher $D^2$ and $V^2$ when the measurement in the future (which is really a state preparation in the past) is available to you. That is the true magic of quantum mechanics. Without it, you won't be able to see any fringes in the data. But with it, you may be able to reconstruct them to full visibility, if that is how the photon was measured at Proxima Centauri. 

How do I know any of this is true? Because we (my student Jennifer Glick and I) analyzed the entire experiment in terms of quantum information theory, and ultimately were able to write down the equations that describe discrimination and visibility (coherence) entirely in terms of entropies and information, in [2] (Jennifer did all the calculations and wrote the first draft of the manuscript). Clearly, "which-path information" should have an obvious information-theoretic rendering, but it turns out that this is actually a little bit tricky because it really is a "conditional information". But it turns out that "coherence" (or "visibility") can also be measured information-theoretically. And lo and behold, the two are related. In our description, they are related by a common information-theoretic identity: the chain rule for entropies. According to that identity, information $I$ and coherence $C$ (as a function of the PCP angle $\theta$) are related so that 
$I(\theta)+C(\theta)=1$        (3) .
In a simple qubit model, the information and coherence take on extremely simple forms, namely $I(\theta)=H[\sin^2(\theta+\pi/4)]$ with $C(\theta)=1-H[\sin^2(\theta+\pi/4)]$, where $H[p]$ is the standard Shannon entropy function $H[p]=-p\log(p)-(1-p)\log(1-p)$. And take a look at how our information-theoretic quantities compare to the quantum optical measures of discrimination and visibility in Fig. 3 below. It almost looks like that discrimination and visibility (coherence) should have been defined information-theoretically from the outset, doesn't it?
Fig.3: Top: Which-path information (solid line) and coherence (dashed line) in terms of quantum information theory. Bottom: Discrimination (solid) and visibility (dashed)  in quantum optics. $Q$ refers to the quantum state at the beam-splitter, and $D_A$  and $D_B$ refer to polarization detectors. From [2].  
So what does all this teach us about quantum mechanics in the end (besides, of course, that quantum mechanics is awesome)? We have learned at least two things. Quantum systems are not either particle or wave. They are in fact neither because both concepts are classical in nature. This, to some extent, I stipulate we knew already. Wheeler knew it.  (Bohr, I contend, not so much). But what I've shown you is that quantum systems don't "change their colors" after measurement either, as Wheeler had advocated. They remain "neither", even when we think we pinned them down, because what I've shown you is that you can have them take on this coat or that, or any in between, years after the ink has dried (I mean, after the data were recorded). They (the photons, electrons, etc.) are not one or the other. They appear to you the way you choose you want to see them, when you interrogate a quantum state with classical devices. 

Those devices cannot reveal to you the reality of the quantum state, because the devices are classical. Don't hate them because of their limitations. Instead, use them wisely, because what I just showed you is that, if used in a clever manner, they enable you to learn something about the true nature of quantum physics after all. As, for example, the experiment in [3] does.


[1] D.M. Greenberger and A. Yasin, "Simultaneous wave and particle knowledge in a neutron interferometer. Physics Letters A 128 (1988) 391-394.
[2] J.R. Glick and C. Adami, "Quantum information theory of the Bell-state quantum eraser". Phys. Rev. A 95 (2017) 012105. Full text also on arXiv
Note: Jennifer Glick is first author on this paper because she performed all calculations in it and wrote the first draft. 
[3] Y.H. Kim, R. Yu, S.P. Kulik, Y.H. Shih, and M.O. Scully, “Delayed “choice” quantum eraser,” Phys Rev Lett 84 (2000) 1-5.

Tuesday, December 6, 2016

Can Life emerge spontaneously?

It would be nice if we knew where we came from. Sure, Darwin's insight that we are the product of an ongoing process that creates new and meaningful solutions to surviving in complex and unpredictable environments is great and all. But it requires three sine qua non ingredients: inheritance, variation, and differential selection. Three does not seem like much, and the last two are really stipulated semper ibi: There is going to be variation in a noisy world, and differences will make a difference in worlds where differences matter. Like all the worlds you and I know. So it is kind of the first ingredient that is a big deal: Inheritance.

Inheritance is indeed a bit more tricky. Actually, a lot more tricky. Inheritance means that an offspring carries the characters of the parent. Not an Earth-shattering concept per se, but in the land of statistical physics, inheritance is not exactly a given. Mark the "offspring" part of that statement. Is making offspring such a common thing?

Depends on how you define "offspring". The term has many meanings. Icebergs "calf" other icebergs, but the "daughter" icebergs are not really the same as the parent in any meaningful way.  Crystals grow, and the "daughter" crystals do indeed have the same structure as the "parent" crystals. But this process (while not without interest to those interested in the origins of life), actually occurs while liberating energy (it is a first-order phase transition).

The replication of cells (or people, for that matter) is very different from the point of view of statistical physics, thermodynamics, and indeed probability theory. Here we are going to look at this process entirely from the point of view of the replication of the information inherent in the cell (or the person). The replication of this information (assuming it is stored in polymers of a particular alphabet) is not energetically favorable. Instead, it requires energy, which explains why cells only grow if there is some kind of food around.

Look, the energetics of molecular replication are complicated, messy, and depend crucially on what molecules are available in what environment, at what temperature, pressure, salt concentrations, etc. etc. My goal for this blog post is to evade all that. Instead, I'm just going to ask how likely it is in general for a molecule that encodes a specific amount of information to arise by chance. Unless the information stored in the sequence is specifically about how to speed up the formation of another such molecule, however unlikely the formation of the first molecule was, the formation of two of them would be twice as unlikely (actually, exponentially so, but we'll get to that).

So this is the trick then: We are not interested in the formation of any old information by chance: we need the spontaneous formation of information about how to make another one of those sequences. Because, if you think a little bit about it, you realize that it is the power of copying that renders the ridiculously rare ... conspicuously commonplace. Need some proof for that? Perhaps the most valuable postage stamp on Earth is the famed "Blue Mauritius", a stamp that has inspired legendary tales and shortened the breath of many a collector, as there are (most likely) only two handfuls of those stamps left in the universe today.

Blue (left) and Red (right) Mauritius of 1847.  (Wikimedia).
But the original plate from which this stamp was printed still exists. Should someone endeavor to print a million of those, I doubt that they each would be worth the millions currently shelled out for one of those "most coveted scraps of paper in existence". (Of course experts would be able to tell apart the copies from the originals because of the sophistication of forensic methods deployed on such works and their forgeries.) But my points still stands: copying makes the rare valuable ... cheaply ordinary.

When the printing press (the molecular kind) has not yet been invented, what does it cost to obtain a piece of information? This blog post will provide the answer, and most importantly, provide pointers to how you could cheat your way to a copy of a piece of information that would be rare not just in thus universe, but a billion billion trillion more. Well, in principle.

How do you quantify rarity? Generally speaking, it is the number of things that you want, divided by the number of things there are. For the origin of life, let's imagine for a moment that replicators are sequences of linear heteropolymers. This just means that they are sequences of "letters" on a string, really. They don't have to self-replicate by themselves, but they have to encode the information necessary to ensure that they get replicated somehow. For the moment, let us restrict ourselves to sequences of a fixed length $L$. Trust me here, this is for your own good. I can write down a more general theory for arbitrary length sequences that does nothing to help you understand. On the contrary. It's not a big deal, so just go with it.

How many sequences are there of length $L$? Exactly $D^L$, of course (where $D$ is the size of the alphabet). How many self-replicators are there among those sequences? That is the big question, we all understand. It could be zero, of course. Let's imagine it is not, and that the number is $N_e$, where $N_e$ in not zero. If there is a process that randomly assembles polymers of length $L$, the likelihood $P$ that you get a replicator in that case is
$P=\frac{N_e}{D^L}$       (1)
So far so good. What we are going to do now is relate that probability to the amount of information contained in the self-replicating sequence. 

That we should be able to do this is fairly obvious, right? If there is no information in a sequence, well than that sequence must be random. This means any sequence is just as good as any other, and $N_e=N$ (all sequences are functional at the same level, namely not functional at all). And in that case, $P=1$ obviously. But now suppose that every single bit in the sequence is functional. That means you can't change anything in that sequence without destroying that function, and implies that there is only one such sequence. (If there were two, you could make at a minimum one change and still retain function.) In that case, $N_e=1$ and $P=1/N$.

What is a good formula for information content that gives you $P=1$ for zero information, and $1/N$ for full information? If $I$ is the amount of information (measured in units of monomers of the polymer), the answer is
$P=D^{-I}.$      (2)
Let's quickly check that. No information is $I=0$, and $D^0=1$ indeed.  Maximal information is $I=L$ (every monomer in the length $L$ sequence is information). And $D^{-L}=1/N$ indeed. (Scroll up to the sentence "How many sequences are there of length $L$", if this is not immediately obvious to you.)

The formula (2) can actually be derived, but let's not do this here. Let's just say we guessed it correctly. But this formula, at first sight, is a monstrosity. If it was true, it should shake you to the bones. 

Not shaken yet? Let me help you out. Let us imagine for a moment that $D=4$ (yeah, nucleotides!). Things will not get any better, by the way, if you use any other base. How much information is necessary (in that base) to self-replicate? Actually, this question does not have an unambiguous answer. But there are some very good guesses at the lower bound. In the lab of Gerry Joyce at the Scripps Research Institute in San Diego, for example, hand-designed self-replicating RNAs can evolve [1]. How much information is contained in them?
Prof. Gerald Joyce, Scripps Research Institute
We can only give an upper bound, because while it takes 84 bits to specify this particular RNA sequence, only 24 of those bits are actually evolvable. The 60 un-evolvable bits (they are un-evolvable because that is how the team set up the system) could, in principle, represent far less information than 60 bits. This may not be clear to you after reading this. But explaining this now would be distracting. I'll explain it further below instead.

Let's take this number (84 bits) at face value for the moment. How likely is it that such a piece of information emerged by chance? According to our formula (2), it is about
$P\approx7.7\times 10^{-25} $
That's a soberingly small likelihood. If you wanted to have a decent chance to find this sequence in a pool of RNA molecules of that length, you'd have to have about 27 kilograms of RNA. That's almost 60 pounds, for those of you that... Never mind.

The point is, wherever linear heteropolymers are assembled by chance, you're not gonna get 27 kilograms of that stuff. You might get significantly smaller amounts (billions of times smaller), but then you would have to wait a billion times longer. On Earth, there wasn't that much time (as Life apparently arose within half a billion years of the Earth's formation). Now, as I alluded to above, the Lincoln-Joyce self-replicator may actually code for fewer than the 84 bits it took to make it. But at the origin of this replicator was intelligent design. A randomly generated one may require fewer bits. We are left with the problem: can self-replicators emerge by chance at all?

This blog post is, really, about these two words: "by chance". What does this even mean?

When writing down formula (2), "by chance" has a very specific meaning. It means that every polymer to be "tried out" has an equal chance of occurring. "Occurring", in chemistry, also has a specific meaning. It means "to be assembled from existing monomers", and if each polymer has an equal chance to be found, then that means that the likelihood to produce any monomer is also equal.

For us, this is self-evident. If I want to calculate the likelihood that a random coin toss creates 10 heads in a row by chance, I take the likelihood of "heads" and take it to the power of ten. But what if your coin is biased? What if it is a coin that lands on head 60% of the time? Well then: in that case, the likelihood to get ten heads in a row is not 1 in 1,024 anymore but rather $(0.6)^{10}$, a factor of about 6.2 larger. This is quite a gain given such a small change in likelihood for a single toss (from 0.5 to 0.6). But imagine that you are looking for 100 heads in a row. The same change in bias now buys you a factor of almost 83 million! And for a sequence of 1,000 heads in a row, you are looking at an enhancement factor of .... about $10^{79}$.

That is the power of bias on events with small probabilities. Mind you, getting 100 heads in a row is still a small probability, but gaining almost seven orders of magnitude is not peanuts. It might be the difference between impossible and... maybe-after-all-possible. Now, how can this be of use in the origin of life?

As I explained, formula (2) relies on assuming that all monomers are created equally likely, with probability $1/D$. When we think about the origin of life in terms of biochemistry, we begin by imagining a process that creates monomers, which are assembled into those linear heteropolymers, and then copied somehow. (In biochemical life on Earth, assembly is done in a template-directed manner, which means that assembly and copying are one and the same thing). But whether assembly is template-directed or not, how likely is is that all monomers occur spontaneously at the same rate? Any biochemist will tell you: extremely unlikely. Instead, some of the monomers are produced spontaneously at one rate, and others at different rate. And these rates depend on local circumstances, like temperature, pH level, abundance of minerals, abundance of just about any element as it turns out. So, depending on where you are on a pre-biotic Earth, you might be faced with wildly different monomer production rates.

This uneven-ness of production can be viewed as a D-sided "coin" where each of the D sides has a different probability of occurring. We can quantify this uneven-ness by the entropy that a sequence of such "coin" tosses produces. (I put "coin" in quotes because a D-sided coin isn't a coin unless D=2. I'm just trying to avoid saying "random variable" here.) This entropy (as you can gleam from the Information Theory tutorial that I've helpfully created for you, starting here) is equal to the length of the sequence if each monomer indeed occurs at rate 1/D (and we take logs to base D), but is smaller than the length if the probability distribution is biased. Let's call $H_a$ the average entropy per monomer, as determined by the local biochemical constraints. And let's remember that if all monomers are created at the same exact rate, $H(a)=1$, (its maximal value), and Eq. (2) holds. If the distribution is uneven, then $H(a)<1$. The entropy of a spontaneously created sequence is then $L\times H(a)$, which is smaller that $L$. In a sense, it is not random anymore, if by random we understand "each sequence equally likely". How could this help increase the likelihood of spontaneous emergence of life?

Well, let's take a closer look at the exponent in Eq. (2), the information $I$. Under certain conditions that I won't get into here, this information is given by the difference between sequence length $L$ and entropy $H$:
$I=L-H.$   (3)
That such a formula must hold is not very surprising. Let's look at the extreme cases. If a sequence is completely random, then $H(a)=1$, and therefore $H=L$, and therefore $I=0$. Thus, a random sequence has no information. On the opposite end, suppose there is only one sequence that can do the job, and any change to the sequence leads to the death of that sequence. Then, the entropy of the sequence (which is the logarithm of the number of ways you can do the job), must be zero. And thus in that case the sequence is all information: $I=L$.  While the correct formula (3) has plenty more terms that become important if there are correlations between sites, we are going to ignore them here.

So remember that the probability for spontaneous emergence of life is so small because $I$ is large, and it is in the exponent. But now we realize that the $L$ in (3) is really the entropy of a spontaneously created sequence, and if $H(a)<1$, then the first term is $L\times H(a)<L$. This can help a lot because it makes $I$ smaller. It helps a lot because the change is in the exponent. Let's look at some examples.

We could first look at English text. The linear heteropolymers of English are strings of the letters a-z (let's just stick with lower case letters and no punctuation for simplicity). What is the likelihood to find the word ${\tt origins}$ by chance? If we use an unbiased typewriter (our 26-sided coin), the likelihood is $26^{-7}$ (about 1 in 8 billion), as ${\tt origins}$ is a 7-mer, and each mer is information (there is only one way to spell the word ${\tt origins}$). Can we do better if our typewriter is biased towards English? Let's find out. If you analyze English text, you quickly notice that letters occur at different frequencies: e more often that t, which occurs more often than a, and so forth. The plot below is the distribution of letters that you would find.

Letter distribution of English text
The entropy-per-letter of this distribution is 0.89 mers. Not very different from 1, but let's see how it changes the 1 in 8 billion odds. The biased-search chance is, according to this theory, $P_\star=26^{7\times 0.89}$, which comes out about 1.5 per billion: an enhancement of more than a factor 12. Obviously, the enhancement is going to more pronounced the longer the sequence. We can test this theory in a more appropriate system: self-replicating computer programs.

That you can breed computer programs inside a computer is nothing new to those who have been following the field of Artificial Life. The form of Artificial life that involves self-replicating programs is called "digital life" (I have written about the history of digital life on this blog), and in particular the program Avida. For those who can't be bothered to look up what kind of life Avida makes, let's just focus on the fact that avidians are computer programs written in a language that has 26 instructions (conveniently abbreviated by the letters a-z), executed on a virtual CPU (you don't want digital critters to wreak havoc on your real CPU, do you?) The letters of these linear heteropolymers have specific meanings on that virtual CPU. For example the letter 'x' stands for ${\tt divide}$, which when executed will split the code into two pieces.

Here's a sketch of what this virtual CPU looks like (with a piece of code on it, being executed)
Avidian CPU and code (from [2]). 
When we use Avida to study evolution experimentally, we seed a population with a hand-written ancestral program. The reason we do this is because self-replicators are rare within the avidian "chemistry": you can't just make a random program and hope that it self-replicates! And that is, as I'm sure has dawned on the reader a while ago, where Avida's importance for studying the origin of life comes from. How rare is such a program?

The standard hand-written replicator is a 15-mer, but we are sure that not all 15 mers are information. If they were, then its likelihood would be $26^{-15}\approx 6\times 10^{-22}$, and it would be utterly hopeless to find it via a random (unbiased) search. It would take about 50,000 years if we tested a million strings a second, on one thousand computers in parallel. We can estimate the information content by sampling the ratio $\frac{N_e}{26^{15}}$, that is, instead of trying out all possible sequences, we try out a billion, and take the fraction of self-replicators to be representative of the overall fraction. (If we don't find any, try ten billion, and so forth).

When we created 1 billion 15-mers using an unbiased distribution, we found 58 self-replicators. That was unexpectedly high, but it pins down the information content to be about
$I(15)=-\log_D(58\times 10^{-9})\approx 5.11 \pm 0.04 $ mers.
The 15 in $I(15)$ reminds us that we were searching within 15 mer space only. But wait: about 5 mers encoded in a 15 mer? Could you write a self-replicator that is as short as 5 mers?

Sadly, no. We tried all 11,881,367 5-mers, and they are all as dead as doornails. (We test those sequences for life by dropping them into an empty world, and then checking whether they can form a colony.) 

Perhaps 6-mers, then? Nope. We checked all 308,915,776 of them. No sign of life. We even checked all 7-mers (over 8 billion of them). No colonies. No life. 

We did find life among 8-mers, though. We first sampled one billion of them, and found 6 unique sequences that would spontaneously form colonies [2]. That number immediately allows us to estimate the information content as 
                       $I(8)=-\log_D(6\times 10^{-9})\approx 5.81 \pm 0.13 $ mers,
which is curious. 

It is curious because according to formula (2) waaay above, the likelihood of finding a self-replicator should only depend on the amount of information in it. How can that information depend on the length of sequence that this information is embedded in? Well it can, and you'll have to read the original reference [2] to find out how. 

By the way, we later tested all sequences of length 8 [3], giving us the exact information content of 8-mer replicators as 5.91 mers.  We even know the exact information content of 9-mer replicators,   but I wont't reveal that here. It took over 3 months of compute time to get this, and I'm saving it for a different post.  

But what about using a biased typewriter? Will this help in finding self-replicators? Let's find out! 
We can start by using the measly 58 replicators found by scanning a billion 15-mers, and making a probability distribution out of it. It looks like this:
Probability distribution of avidian instructions among 58 replicators of L=15. The vertical line is the unbiased expectation.
It's clear that some instructions are used a lot (b,f,g,v,w,x). If you look up what their function is, they are not exactly surprising. You may remember that 'x' means ${\tt divide}$. Obviously, without that instruction you're not going to form colonies. 

The distribution has an entropy of 0.91 mers. Not fantastically smaller than 1, but we saw earlier that small changes in the exponent can have large consequences. When we searched the space of 15 mers with this distribution instead of the uniform one, we found 14,495 replicators among a billion tried, an enhancement by a factor of about 250. Certainly not bad, and a solid piece of evidence that the "theory of the biased typewriter" actually works.  In fact, the theory underestimates the enhancement, as it predicts (based on the entropy 0.91 mers) an enhancement of about 80 [2].

We even tested whether taking the distribution generated by the 14,495 replicators, which certainly is a better estimate of a "good distribution", will net even more replicators. And it does indeed. Continuing like this allows your search to zero in on the "interesting" parts of genetic space with more laser-like fashion, but the returns are, understandably, diminishing.

What we learn from all this is the following: do not be fooled by naive estimates of the likelihood of spontaneous emergence of life, even if they are based on information theory (and thus vastly superior to those who would claim that $P=D^{-L}$). Real biological systems search with a biased distribution. The bias will probably go "in the wrong direction" in most environments. (Imagine an avidian environment where 'x' is never made.) But in a few of the zillion of environments that may exist on a prebiotic Earth, a handful of them might have a distribution that is close to the one we need. And in that case, life suddenly becomes possible. 

How possible? We still don't know. But at the very least, the likelihood does not have to be astronomically small, as long as nature will use that one little trick: whip out that biased typewriter, to help you mumble more coherently. 

[1] T. A. Lincoln and G. F. Joyce, Self-sustained replication of an RNA enzyme, Science 323, 1229–1232, 2009.
[2] C. Adami and T. LaBar, From entropy to information: Biased typewriters and the origin of life. In: “From Matter to Life: Information and Causality” (S.I. Walker, P.C.W. Davies, and G. Ellis, eds.) Cambridge University Press (2017), pp. 95-113. Also on arXiv
[3] Nitash C.G., T. LaBar, A. Hintze, and C. Adami, Origin of life in a digital microcosm. To appear. 

Wednesday, March 30, 2016

Ten Years (give or take) in the Evolution of a Protein

How do proteins evolve? Generally the answer is "Very slowly!". But sometimes, protein evolution can be blazingly fast. How fast, you ask? Ask instead the lizards of the South Adriatic Sea!

OK, where is the South Adriatic Sea? you ask. You should really be asking "What about those lizards?", but here we go. The Adriatic Sea separates Italy from the Balkan peninsula, as in the picture below (upper left corner). So in 1971, researchers decided to take a species of lizards (known as Podarcis sicula, the Italian wall lizard) found on the small island Kopiste, and transplant them to the neighboring small island Mrcaru. 
Adriatic Sea (top left). Pod Kopiste is the tiny island on the left, and Pod Mrcaru is to its right. The larger island is the inhabited Lastovo (credit: Google World)
don't know why they did it. They transplanted five adult breeding pairs, so they were intent on creating havoc, no doubt. Or an experiment, perhaps? But the Croatian War of Independence intervened, and the lizards were all but forgotten until a team returned in 2004 to Mrcaru to look at the local lizards there. And they found that the offspring of the ten had essentially overrun the island, and changed in profound ways. On Kopiste, the lizards ate mostly insects. On Mrcaru, instead, there was an abundance of plants for food, and comparatively fewer insects. The insect-eating lizards, however, were not adapted to digest plants, something that requires a different stomach structure that ensures that the plants stay in the intestine long enough to digest the plant cellulose. If it does not stay in the stomach, you can't get the energy from it. It turns out that the lineage on Mrcaru evolved so-called cecal valves, something that does not usually occur in lizards. The cecal valves close off parts of the stomach, so that some types of bacteria could ferment the cellulose in there. This is stunning only because this adaptation took just over thirty years. It turns out that other body characteristics had changed too: longer, wider, and taller heads that translate in larger forces to bite down on the tough fibrous plants. The lizards needed to survive: this is how they did it.

Can proteins really evolve that fast? It seems that the answer is: "If you really really have to, then yes". What a pity that we haven't been able to sample the sequences of the proteins involved over the thirty some years. Wouldn't that give us a fantastic window on protein evolution? But how can you know that a protein is about to undergo fundamental changes?

It turns out that you can, if you modify the environment in such a way that it becomes unlivable for the organism involved, and you then look for those types that survive the slaughter. Sounds immoral? But we do it all the time, when we give drugs to fight viral infections! The example I will use is the evolution of drug resistance in a protein of the Human Immunodeficiency Virus (HIV), the virus that causes AIDS

AIDS broke out into the Western population in 1981, but it took fourteen years to develop the first effective anti-viral treatment: a drug that inhibits a crucial piece of the HIV machinery: the protease. To understand the drug and what the protease does, we have to spend some time with the somewhat unusual lifecycle of HIV. It is a retrovirus, which means that its genetic material is RNA, not DNA. The virus infects cells that are crucial in people's ability to fight infections, which explains to a large extent why it is so deadly: it attacks precisely the system that is supposed to save you. The figure below gives you an idea of the virus's life cycle.

HIV life cycle. Source: Wikimedia
After the virus capsid (the shell that encapsulates the virus RNA along with a few necessary molecules) binds to the cell (here, a T-cell, which is a type of white blood cell that plays a central role in the immune system), the virus injects the capsid's material into the cell. Along with the RNA in the capsid comes an enzyme called the "reverse transcriptase", which is able to make a DNA copy from the RNA material, and this DNA copy is subsequently inserted ("integrated") into the host cell's DNA. Now, the DNA of every cell is constantly transcribed and then translated into proteins, and the same is going to happen to the foreign DNA that was inserted into the host cell. Willy-nilly, the cell makes proteins from the virus's information: it is making virus parts. But it turns out that unlike your own proteins that have stop codons to indicate where a protein ends, the foreign DNA (made from virus RNA) does not have those. As a consequence, the cellular machinery produces one long long protein, called a "polyprotein". It is, of course, totally unusable in this form. It must be cleaved (meaning "cut") into the functional pieces with a knife. Where can the virus find such a knife? Well, it makes it itself, and it carries a copy with it in the capsid. Armed with this knife, the polyprotein is cut into all the pieces that are needed to assemble another functional capsid (including the protease and the reverse transcriptase) and packaged with copies of the RNA genetic code (which the cell helpfully made for free) into new capsids. The action of the knife (called a "protease") is shown in the lower left corner of the life cycle diagram above.

"If I could just blunt this knife", is what HIV researchers were asking themselves, and they found just the way to do it. Take a look at the molecular structure of the protease in the figure below. 

The HIV molecule is a dimer (meaning it is made out of two copies of the same protein that bind to each other, here in cyan and green). Two particular amino acids that are important in the activity of the molecule are colored red and purple
See the hole in the middle, surrounded by the red and purple amino acids? That's where the polyprotein fits in, and the protease cuts it like a cigar cutter at specific points that are recognized by the red and purple residues. How do you inactivate the cigar cutter? You stick something in there to block the hole! Indeed, this is how all protease inhibitors--that is, drugs that inhibit the activity of the protease, work. 

When these drugs hit the market, they were replacing older drugs that had nasty side effects. And these new drugs worked like magic! The only trouble was that the virus was not going to capitulate that easily. Indeed, researchers had created just the scenario that we were calling for above: change the environment in such a manner that makes it unlivable for an organism, and see how it can cope. 

HIV protease inhibitors work really well (in particular if associated with another drug, the reverse transcriptase inhibitor), which means that the virus population all but goes extinct. The important modifier here is "all but". Instead of going extinct, it goes into hiding, and researcher don't really know where. As you can imagine, finding this hiding spot (and how to coerce the virus to leave it) is a major effort of HIV research today. A problem arises if a patient forgets to take their antiviral drugs. The virus comes out, starts replicating (slowly), and the high mutation rate of the virus creates the opportunity to evolve quickly. HIV can evolve resistance to a protease inhibitor within two weeks. This is not altogether surprising, as when unchecked the virus creates an enormous number of copies (correct and flawed) of the virus every day, so that every single mutation of the nearly 10,000 nucleotide genome is tried multiple times every day, and every pair of mutations a few times. This is enough to cause rapid evolution, and if a single virus finds a way to survive the massacre the drug unleashes, that virus will grow in numbers and create the seeds of a new destructive force that the inhibitor is unprepared for. When resistance emerges, researchers go back to the lab to develop a new type of protease inhibitor, a new way to dull the knife. While it is effective for a while, evolution ultimately keeps up, and finds a way to evade it. How do we stop this maddening race?

The history of this fight between the virus and the drugs that attempt to keep it at bay is documented, as it occurred after we had figured out how to sequence stuff. Every paper that relied on patient data, and every drug trial, was asked to deposit their sequence data (namely the sequence of the virus they extracted from their patients) and deposit it on publicly accessible databases. This sequence data became the "fossils" of this evolutionary history, and it is made from the viral RNA of patients that fought this fight, on the frontline. Many of those did not survive the fight, but they bequeathed  their virus's sequence data to us for posterity so that we can, perhaps, save the next generation.

Patients that were enrolled in a multitude of drug trials would have the virus's information sequenced, and these records ultimately found their way into Stanford University's HIV resistance database (HIVdb).  All sequence data is usually deposited in central repositories such as Genbank, but Stanford's HIVdb creates an enormous service by curating the HIV data on a single site, and developing tools and algorithms to investigate that data. In my lab, I decided that we should mine this "fossil record" to understand how HIV is adapting to, and attempting to evade, the drugs thrown at it. The evolution of drug resistance in HIV can thus be seen as a long-term evolution experiment (LTEE), only compared to the LTEE is it short, and we do not have frozen isolates.  The Stanford database is a compendium that allows users to query all sorts of information about sequence, type, and resistance profile. For our purposes, namely to study how the sequence evolves, we need only two things: sequences, and whether the patients who donated the sequence were receiving anti-viral drugs. 

To understand how evolution is affecting a protein, we have to discuss the concept of the "fitness landscape". Entire series of blog posts can be written about this concept, but we don't have that kind of space here. Broadly speaking, a fitness landscape is an idealized picture of how the fitness of an organism depends on either the traits or the genome that determine the organism. Here, we will focus on the mapping between sequence and fitness, not traits and fitness. In such a picture, the fitness is the "elevation", and the sequence is the coordinate. If you search for "fitness landscape" you will almost invariably end up with a picture that originates from my lab. Give it a try! You might for example find this: 

A rugged fitness landscape with different evolutionary paths. Credit: Randal S. Olson
This is a rendering of a rugged fitness landscape that my student (at the time) Randy Olson created for a manuscript that we ended up not finishing.  The general idea depicted there is that mutation-by-mutation you could move peak-to-peak, or if this is not possible, you might choose a path that tries to maximize fitness, even though you may have to walk in the valleys between peaks for a (short) while.

If you consider a protein landscape (the z-axis values in the landscape represent how well a protein is doing its job) then most proteins occupy a peak, because if they did not, then mutations would move them closer to the peak until there are no more ways to improve the protein. Drugs that attack the function of a protein (such as the protease inhibitor blunting the protease as described above) change the landscape profoundly: you can imagine that they simply erase the peak. You might think that this would kill the organism (if the protein is essential). Due to the high mutation rate of the HIV virus, there are actually a lot of variants that exist in the population. Many of them are completely defective, but some of them "live" at the edges of the fitness peak that the un-mutated protein occupies. Because they are barely functional they usually do not play a role. But when the main peak is eliminated, the sequences at the fringes may be the only ones to survive. They make a virus that replicates very slowly, but replicate it does. And thus evolution can continue: if there is any way to improve the function of the protein, that path will be taken. The protein will find a distant peak to climb, and the virus is resurrected: it has evolved resistance to the drug.

Even though research has discovered more and more potent anti-viral drugs, which attack different proteins and are thus more effective than any single drug can be, the virus ultimately will evade them, in particular if the patient forgets to take the drug so that the virus can replicate faster and thus accumulate mutations faster. Is there no way to stop this?

In research that has just appeared in the journal PLoS Genetics, my colleague Aditi Gupta (now a postdoctoral researcher at the New Jersey Medical School of Rutgers University)  and I studied how the virus adapts to more and more complex drug environments over a span of almost 10 years. We studied the evolution of the HIV protease (the molecule you encountered above) using sequences deposited in the Stanford database. We found two things: First: in patients that did not receive drugs, the protease molecule was not evolving. Second, in patients that did receive drugs, the protease molecule was evolving quickly, but it evolved in a peculiar way: by storing information in epistatic interactions, rather than in residue changes.

Ok Ok, I realize that this was a mouthful. First, what was that bit about information? You see, for a protein (as well as all life, in the end) everything is about information. A protein that "does its job" has information about the environment within which it is active. Its sequence encodes that information, but it is information about that environment. You change the environment, and what used to be information may not be information anymore. Information is contextual (as I argue in a series of blog posts that starts here). The evolution of drug resistance, in the light of information theory, is then just the quest to "learn" (that is acquire information) about that new world, the new context. 

And it so happens that you can store information in different ways in a sequence. You can certainly store it in the individual symbols that make up the sequence. That is how we usually think of storing information. It is less well-known that you can also store information in the correlations between symbols. I don't know of a good way to make this intuitive. Information is something that allows you to make predictions (as I argue in the above-mentioned series). A single site being an 'A' (instead of a 'C', 'G', or 'T') might be predictive of a particular environmental state. But you can imagine that a site being an 'A' as long as a a very particular other sited is a 'G' can also be predictive, as long as the only pairs that are allowed are 'AG' and 'GA'. This kind of "dependence" between sites is known as "epistasis" in genetics. There is an enormous amount of literature about epistasis in genetics (as there should be, as I believe it to be the central concept in evolutionary biology) but this post is already too long, so I must refer you to the wiki pages to learn more.

What I argue thus, in a nutshell, is that you can store information in substitutions (of residues) or you can store it in epistatic interactions between residues. What Aditi Gupta and I found by analyzing the "fossil record" of almost ten years of protein evolution is that the protease mostly stored information in the linkages between residues. 

I know what you are asking: "Why would a protein do that, and what are the consequences?" These are good questions. Let's investigate them one by one. 

Storing information in "correlated changes" (epistatic interactions) is a necessity if you are rushed. The reason is technical, and you are forgiven if you don't grasp the entirety of the argument. Single substitutions (the "simple" way to store information) has serious repercussions for a protein, as substitutions (on average) destabilize the protein. Yes, you do remember that a protein has to fold into its structural conformation, and it doesn't just do that willy-nilly (that's the second time I used that construction, isn't it?). This fold has to be energetically favored, and changes in the residue usually make things worse for those energetics. This isn't a problem if a substitution makes it just a little harder to fold, and if at the same time you have enough time to correct for that problem, by making a compensating substitution somewhere else, later. But if time is of the essence (as when the protein just found its peak utterly annihilated) you can't just substitute a residue, because you probably have to substitute another too, and that would make the protein not fold. A non-folded protein is a dead protein. It cannot wait for a substitution that will save it.

But as I pointed out, there is another way to "learn" (that is, acquire information) by changing the way residues interact. Such changes affect the folding free energy of the protein very little, and as a consequence this is the favored mode of information acquisition if time is of the essence. What we find in the fossil record is that, indeed, this is how evolution proceeds.

What are the consequences? Well, they are likely to be profound. If a protein evolves to store information in linkages between residues, that implies that the protein becomes more and more constrained. After doing this for a a while, there aren't that many residues anymore that are free to vary, as there are so many relative states that need to be satisfied. In theory, this means that the protein is evolving itself into a corner from which there may be no escape. What it means is that the protein inhabits a fitness landscape that becomes more and more rugged the more interactions are being locked in between residues. 

Let me show you some of the technical evidence that appears in the paper. In the figure below, you see something we call "sum of pairwise MI", where MI stands for "mutual information". You can think of that measure as representing the amount of information stored in the linkages between residues in the protein. As a matter of fact, you shouldn't just think of it in those terms, it is precisely that. This measure is increasing in patients that respond to drug treatment (blue triangles), but does not change in patients that are not receiving those drugs (but really are wishing they would).

Pairwise epistasis, measured in terms of mutual information, as a function of time in the HIV-1 protease. Triangles: patients taking anti-viral drugs. Circles: patients not taking any anti-viral drugs.
What this plot shows is that the proteins that are adapting to drugs do so by creating functional links between residues, and this evolution persists as more and more sophisticated drugs are introduced. But the trend seems to be stalling within the last three years. Could it be that the virus is becoming so constrained that further adaptation is impossible?

I wish I knew the answer to this question, but I don't. At least from the time course we investigated in this paper, there is no evidence that the protein has slowed its evolution. But I must caution that we only investigated the evolution of the HIV protease for the years 1998-2006. There is sequence data for the years after 2006, of course, but our study was explicitly comparing the response of patients that took anti-viral drugs to those that did not. And after 2006, you could not find enough sequences from patients not taking anti-viral drugs in the database to make statements that were statistically sound. We understand the reason for this, of course, as the anti-viral drugs had become so potent that it would be morally reprehensible to withhold them from a control group. 

It is possible that a slow-down of evolution can be discerned in the sequences of patients that were exposed to anti-viral drugs post 2006. That would be a stunning development, which would have profound implications for the evolution of drug resistance in HIV. The data is there. Who wants to analyze it?

The study I discuss was published as:

A. Gupta and C. Adami, "Strong selection significantly increases epistatic interactions in the long-term evolution of a protein". PLoS Genetics 12 (2016) 1005960.