Wednesday, March 30, 2016

Ten Years (give or take) in the Evolution of a Protein

How do proteins evolve? Generally the answer is "Very slowly!". But sometimes, protein evolution can be blazingly fast. How fast, you ask? Ask instead the lizards of the South Adriatic Sea!

OK, where is the South Adriatic Sea? you ask. You should really be asking "What about those lizards?", but here we go. The Adriatic Sea separates Italy from the Balkan peninsula, as in the picture below (upper left corner). So in 1971, researchers decided to take a species of lizards (known as Podarcis sicula, the Italian wall lizard) found on the small island Kopiste, and transplant them to the neighboring small island Mrcaru. 
Adriatic Sea (top left). Pod Kopiste is the tiny island on the left, and Pod Mrcaru is to its right. The larger island is the inhabited Lastovo (credit: Google World)
don't know why they did it. They transplanted five adult breeding pairs, so they were intent on creating havoc, no doubt. Or an experiment, perhaps? But the Croatian War of Independence intervened, and the lizards were all but forgotten until a team returned in 2004 to Mrcaru to look at the local lizards there. And they found that the offspring of the ten had essentially overrun the island, and changed in profound ways. On Kopiste, the lizards ate mostly insects. On Mrcaru, instead, there was an abundance of plants for food, and comparatively fewer insects. The insect-eating lizards, however, were not adapted to digest plants, something that requires a different stomach structure that ensures that the plants stay in the intestine long enough to digest the plant cellulose. If it does not stay in the stomach, you can't get the energy from it. It turns out that the lineage on Mrcaru evolved so-called cecal valves, something that does not usually occur in lizards. The cecal valves close off parts of the stomach, so that some types of bacteria could ferment the cellulose in there. This is stunning only because this adaptation took just over thirty years. It turns out that other body characteristics had changed too: longer, wider, and taller heads that translate in larger forces to bite down on the tough fibrous plants. The lizards needed to survive: this is how they did it.

Can proteins really evolve that fast? It seems that the answer is: "If you really really have to, then yes". What a pity that we haven't been able to sample the sequences of the proteins involved over the thirty some years. Wouldn't that give us a fantastic window on protein evolution? But how can you know that a protein is about to undergo fundamental changes?

It turns out that you can, if you modify the environment in such a way that it becomes unlivable for the organism involved, and you then look for those types that survive the slaughter. Sounds immoral? But we do it all the time, when we give drugs to fight viral infections! The example I will use is the evolution of drug resistance in a protein of the Human Immunodeficiency Virus (HIV), the virus that causes AIDS

AIDS broke out into the Western population in 1981, but it took fourteen years to develop the first effective anti-viral treatment: a drug that inhibits a crucial piece of the HIV machinery: the protease. To understand the drug and what the protease does, we have to spend some time with the somewhat unusual lifecycle of HIV. It is a retrovirus, which means that its genetic material is RNA, not DNA. The virus infects cells that are crucial in people's ability to fight infections, which explains to a large extent why it is so deadly: it attacks precisely the system that is supposed to save you. The figure below gives you an idea of the virus's life cycle.

HIV life cycle. Source: Wikimedia
After the virus capsid (the shell that encapsulates the virus RNA along with a few necessary molecules) binds to the cell (here, a T-cell, which is a type of white blood cell that plays a central role in the immune system), the virus injects the capsid's material into the cell. Along with the RNA in the capsid comes an enzyme called the "reverse transcriptase", which is able to make a DNA copy from the RNA material, and this DNA copy is subsequently inserted ("integrated") into the host cell's DNA. Now, the DNA of every cell is constantly transcribed and then translated into proteins, and the same is going to happen to the foreign DNA that was inserted into the host cell. Willy-nilly, the cell makes proteins from the virus's information: it is making virus parts. But it turns out that unlike your own proteins that have stop codons to indicate where a protein ends, the foreign DNA (made from virus RNA) does not have those. As a consequence, the cellular machinery produces one long long protein, called a "polyprotein". It is, of course, totally unusable in this form. It must be cleaved (meaning "cut") into the functional pieces with a knife. Where can the virus find such a knife? Well, it makes it itself, and it carries a copy with it in the capsid. Armed with this knife, the polyprotein is cut into all the pieces that are needed to assemble another functional capsid (including the protease and the reverse transcriptase) and packaged with copies of the RNA genetic code (which the cell helpfully made for free) into new capsids. The action of the knife (called a "protease") is shown in the lower left corner of the life cycle diagram above.

"If I could just blunt this knife", is what HIV researchers were asking themselves, and they found just the way to do it. Take a look at the molecular structure of the protease in the figure below. 

The HIV molecule is a dimer (meaning it is made out of two copies of the same protein that bind to each other, here in cyan and green). Two particular amino acids that are important in the activity of the molecule are colored red and purple
See the hole in the middle, surrounded by the red and purple amino acids? That's where the polyprotein fits in, and the protease cuts it like a cigar cutter at specific points that are recognized by the red and purple residues. How do you inactivate the cigar cutter? You stick something in there to block the hole! Indeed, this is how all protease inhibitors--that is, drugs that inhibit the activity of the protease, work. 

When these drugs hit the market, they were replacing older drugs that had nasty side effects. And these new drugs worked like magic! The only trouble was that the virus was not going to capitulate that easily. Indeed, researchers had created just the scenario that we were calling for above: change the environment in such a manner that makes it unlivable for an organism, and see how it can cope. 

HIV protease inhibitors work really well (in particular if associated with another drug, the reverse transcriptase inhibitor), which means that the virus population all but goes extinct. The important modifier here is "all but". Instead of going extinct, it goes into hiding, and researcher don't really know where. As you can imagine, finding this hiding spot (and how to coerce the virus to leave it) is a major effort of HIV research today. A problem arises if a patient forgets to take their antiviral drugs. The virus comes out, starts replicating (slowly), and the high mutation rate of the virus creates the opportunity to evolve quickly. HIV can evolve resistance to a protease inhibitor within two weeks. This is not altogether surprising, as when unchecked the virus creates an enormous number of copies (correct and flawed) of the virus every day, so that every single mutation of the nearly 10,000 nucleotide genome is tried multiple times every day, and every pair of mutations a few times. This is enough to cause rapid evolution, and if a single virus finds a way to survive the massacre the drug unleashes, that virus will grow in numbers and create the seeds of a new destructive force that the inhibitor is unprepared for. When resistance emerges, researchers go back to the lab to develop a new type of protease inhibitor, a new way to dull the knife. While it is effective for a while, evolution ultimately keeps up, and finds a way to evade it. How do we stop this maddening race?

The history of this fight between the virus and the drugs that attempt to keep it at bay is documented, as it occurred after we had figured out how to sequence stuff. Every paper that relied on patient data, and every drug trial, was asked to deposit their sequence data (namely the sequence of the virus they extracted from their patients) and deposit it on publicly accessible databases. This sequence data became the "fossils" of this evolutionary history, and it is made from the viral RNA of patients that fought this fight, on the frontline. Many of those did not survive the fight, but they bequeathed  their virus's sequence data to us for posterity so that we can, perhaps, save the next generation.

Patients that were enrolled in a multitude of drug trials would have the virus's information sequenced, and these records ultimately found their way into Stanford University's HIV resistance database (HIVdb).  All sequence data is usually deposited in central repositories such as Genbank, but Stanford's HIVdb creates an enormous service by curating the HIV data on a single site, and developing tools and algorithms to investigate that data. In my lab, I decided that we should mine this "fossil record" to understand how HIV is adapting to, and attempting to evade, the drugs thrown at it. The evolution of drug resistance in HIV can thus be seen as a long-term evolution experiment (LTEE), only compared to the LTEE is it short, and we do not have frozen isolates.  The Stanford database is a compendium that allows users to query all sorts of information about sequence, type, and resistance profile. For our purposes, namely to study how the sequence evolves, we need only two things: sequences, and whether the patients who donated the sequence were receiving anti-viral drugs. 

To understand how evolution is affecting a protein, we have to discuss the concept of the "fitness landscape". Entire series of blog posts can be written about this concept, but we don't have that kind of space here. Broadly speaking, a fitness landscape is an idealized picture of how the fitness of an organism depends on either the traits or the genome that determine the organism. Here, we will focus on the mapping between sequence and fitness, not traits and fitness. In such a picture, the fitness is the "elevation", and the sequence is the coordinate. If you search for "fitness landscape" you will almost invariably end up with a picture that originates from my lab. Give it a try! You might for example find this: 

A rugged fitness landscape with different evolutionary paths. Credit: Randal S. Olson
This is a rendering of a rugged fitness landscape that my student (at the time) Randy Olson created for a manuscript that we ended up not finishing.  The general idea depicted there is that mutation-by-mutation you could move peak-to-peak, or if this is not possible, you might choose a path that tries to maximize fitness, even though you may have to walk in the valleys between peaks for a (short) while.

If you consider a protein landscape (the z-axis values in the landscape represent how well a protein is doing its job) then most proteins occupy a peak, because if they did not, then mutations would move them closer to the peak until there are no more ways to improve the protein. Drugs that attack the function of a protein (such as the protease inhibitor blunting the protease as described above) change the landscape profoundly: you can imagine that they simply erase the peak. You might think that this would kill the organism (if the protein is essential). Due to the high mutation rate of the HIV virus, there are actually a lot of variants that exist in the population. Many of them are completely defective, but some of them "live" at the edges of the fitness peak that the un-mutated protein occupies. Because they are barely functional they usually do not play a role. But when the main peak is eliminated, the sequences at the fringes may be the only ones to survive. They make a virus that replicates very slowly, but replicate it does. And thus evolution can continue: if there is any way to improve the function of the protein, that path will be taken. The protein will find a distant peak to climb, and the virus is resurrected: it has evolved resistance to the drug.

Even though research has discovered more and more potent anti-viral drugs, which attack different proteins and are thus more effective than any single drug can be, the virus ultimately will evade them, in particular if the patient forgets to take the drug so that the virus can replicate faster and thus accumulate mutations faster. Is there no way to stop this?

In research that has just appeared in the journal PLoS Genetics, my colleague Aditi Gupta (now a postdoctoral researcher at the New Jersey Medical School of Rutgers University)  and I studied how the virus adapts to more and more complex drug environments over a span of almost 10 years. We studied the evolution of the HIV protease (the molecule you encountered above) using sequences deposited in the Stanford database. We found two things: First: in patients that did not receive drugs, the protease molecule was not evolving. Second, in patients that did receive drugs, the protease molecule was evolving quickly, but it evolved in a peculiar way: by storing information in epistatic interactions, rather than in residue changes.

Ok Ok, I realize that this was a mouthful. First, what was that bit about information? You see, for a protein (as well as all life, in the end) everything is about information. A protein that "does its job" has information about the environment within which it is active. Its sequence encodes that information, but it is information about that environment. You change the environment, and what used to be information may not be information anymore. Information is contextual (as I argue in a series of blog posts that starts here). The evolution of drug resistance, in the light of information theory, is then just the quest to "learn" (that is acquire information) about that new world, the new context. 

And it so happens that you can store information in different ways in a sequence. You can certainly store it in the individual symbols that make up the sequence. That is how we usually think of storing information. It is less well-known that you can also store information in the correlations between symbols. I don't know of a good way to make this intuitive. Information is something that allows you to make predictions (as I argue in the above-mentioned series). A single site being an 'A' (instead of a 'C', 'G', or 'T') might be predictive of a particular environmental state. But you can imagine that a site being an 'A' as long as a a very particular other sited is a 'G' can also be predictive, as long as the only pairs that are allowed are 'AG' and 'GA'. This kind of "dependence" between sites is known as "epistasis" in genetics. There is an enormous amount of literature about epistasis in genetics (as there should be, as I believe it to be the central concept in evolutionary biology) but this post is already too long, so I must refer you to the wiki pages to learn more.

What I argue thus, in a nutshell, is that you can store information in substitutions (of residues) or you can store it in epistatic interactions between residues. What Aditi Gupta and I found by analyzing the "fossil record" of almost ten years of protein evolution is that the protease mostly stored information in the linkages between residues. 

I know what you are asking: "Why would a protein do that, and what are the consequences?" These are good questions. Let's investigate them one by one. 

Storing information in "correlated changes" (epistatic interactions) is a necessity if you are rushed. The reason is technical, and you are forgiven if you don't grasp the entirety of the argument. Single substitutions (the "simple" way to store information) has serious repercussions for a protein, as substitutions (on average) destabilize the protein. Yes, you do remember that a protein has to fold into its structural conformation, and it doesn't just do that willy-nilly (that's the second time I used that construction, isn't it?). This fold has to be energetically favored, and changes in the residue usually make things worse for those energetics. This isn't a problem if a substitution makes it just a little harder to fold, and if at the same time you have enough time to correct for that problem, by making a compensating substitution somewhere else, later. But if time is of the essence (as when the protein just found its peak utterly annihilated) you can't just substitute a residue, because you probably have to substitute another too, and that would make the protein not fold. A non-folded protein is a dead protein. It cannot wait for a substitution that will save it.

But as I pointed out, there is another way to "learn" (that is, acquire information) by changing the way residues interact. Such changes affect the folding free energy of the protein very little, and as a consequence this is the favored mode of information acquisition if time is of the essence. What we find in the fossil record is that, indeed, this is how evolution proceeds.

What are the consequences? Well, they are likely to be profound. If a protein evolves to store information in linkages between residues, that implies that the protein becomes more and more constrained. After doing this for a a while, there aren't that many residues anymore that are free to vary, as there are so many relative states that need to be satisfied. In theory, this means that the protein is evolving itself into a corner from which there may be no escape. What it means is that the protein inhabits a fitness landscape that becomes more and more rugged the more interactions are being locked in between residues. 

Let me show you some of the technical evidence that appears in the paper. In the figure below, you see something we call "sum of pairwise MI", where MI stands for "mutual information". You can think of that measure as representing the amount of information stored in the linkages between residues in the protein. As a matter of fact, you shouldn't just think of it in those terms, it is precisely that. This measure is increasing in patients that respond to drug treatment (blue triangles), but does not change in patients that are not receiving those drugs (but really are wishing they would).

Pairwise epistasis, measured in terms of mutual information, as a function of time in the HIV-1 protease. Triangles: patients taking anti-viral drugs. Circles: patients not taking any anti-viral drugs.
What this plot shows is that the proteins that are adapting to drugs do so by creating functional links between residues, and this evolution persists as more and more sophisticated drugs are introduced. But the trend seems to be stalling within the last three years. Could it be that the virus is becoming so constrained that further adaptation is impossible?

I wish I knew the answer to this question, but I don't. At least from the time course we investigated in this paper, there is no evidence that the protein has slowed its evolution. But I must caution that we only investigated the evolution of the HIV protease for the years 1998-2006. There is sequence data for the years after 2006, of course, but our study was explicitly comparing the response of patients that took anti-viral drugs to those that did not. And after 2006, you could not find enough sequences from patients not taking anti-viral drugs in the database to make statements that were statistically sound. We understand the reason for this, of course, as the anti-viral drugs had become so potent that it would be morally reprehensible to withhold them from a control group. 

It is possible that a slow-down of evolution can be discerned in the sequences of patients that were exposed to anti-viral drugs post 2006. That would be a stunning development, which would have profound implications for the evolution of drug resistance in HIV. The data is there. Who wants to analyze it?

The study I discuss was published as:

A. Gupta and C. Adami, "Strong selection significantly increases epistatic interactions in the long-term evolution of a protein". PLoS Genetics 12 (2016) 1005960.

Friday, March 4, 2016

On quantum measurement (Part 7: There goes the Copenhagen Interpretation)

So this is the final installment of the "On quantum measurement" series. You may have arrived here by reading all previous parts in one sitting (I've heard of such feats in the comments). This is the apotheosis: what all these posts have been gearing up to. If, for some reason that only the Internets know, you have arrived here without the benefit of the first six installments, I'll provide you with the link to the very first installment, but I won't summarize all the posts, out of deference to all the readers who got here the conventional way. 

The Copenhagen Interpretation of quantum mechanics, as I'm sure all of you that have arrived to Part 7 are aware of, is a view of the meaning of quantum mechanics promulgated mostly by the Danish physicist Niels Bohr, and codified in the 1920s, that is, the "heydays" of quantum physics. Quantum mechanics can be baffling to be sure, and there are multiple attempts to square what we observe experimentally with our common sense. The Copenhagen Interpretation is an extreme view (in my opinion) of how to make sense of the reflection of the quantum world in our classical measurement devices. So, at its very core, the Copenhagen Interpretation muses about the relationship of the classical to the quantum world.

As a young student of quantum mechanics in the early eighties, I was a bit baffled by this right away. When the true underlying physics is quantum (I mused), and that therefore the classical world is just an approximation of the quantum, how can we have "theorems" that codify the relationship between quantum and classical systems? 

I won't write a treatise here about the Copenhagen Interpretation. I've already linked the Wikipedia article about it, which should get those of you who are not yet groaning up to speed. I'll just list the two central "things" that are taught just about everywhere quantum mechanics is taught, and that can be squarely traced back to Bohr's school. 

1. Physical systems  do not have definite properties prior to being measured, but instead should be described by a set of probabilities
2. The act of measurement changes the quantum system, so that it takes on only one of the previous possibilities (wave function collapse, or reduction)

Yes, the general understanding of the Copenhagen Interpretation is more multi-faceted, but for the purpose of this post I will focus on the collapse of the wave function. When I first fully understood what that meant, it was immediately clear to me that this was just a load of crap. I knew of no law of physics that could engender such a collapse, and it violated everything I believed in (such as conservation of probabilities). You who reads this blog so ardently already know this: it makes no sense from the point of view of information theory. 

Now, quantum information theory did not exist around the time of Bohr (and Heisenberg, who must carry some of the blame for the Copenhagen Interpretation). And maybe the two should get a pass for this simple reason, except for the fact that John von Neumann, as I have pointed out in another post), had the foundations of quantum information theory already worked out in 1932, two years after the first "definitive" treatise on the "Copenhagen spirit" was published by Heisenberg.

So you, faithful reader, come to this post well prepared. You already know that Hans Bethe told me and my colleague Nicolas Cerf that we showed that wave functions don't collapse, you know that John von Neumann almost discovered quantum information theory in the 30s, that quantum measurement is very different from its classical counterpart because copying is not allowed in the quantum world. You know where Born's rule comes from, and you pondered the utility of quantum Venn diagrams. You were promised a discussion of Schrödinger's cat, but that never materialized. Instead, you were given a discussion of the quantum eraser. Arguably, that is a more interesting system, but I understand if you are miffed. But to make it up, now we get to the quantum grand-daddy of them all. I will show you that the Copenhagen interpretation is not only toast theoretically, but that it is possible to design experiments that will show this. Or they will show that I'm full of the aforementioned crap. Either way, it is going to be exciting. 

In this post, I will reveal to you the mathematical beauty and elegance of consecutive measurements performed on the same quantum system.  I will also show you how looking at three measurements in a row (but not two), will reveal to you that the Copenhagen Interpretation is now history, ripe for the trash heap of ill-conceived concepts in theoretical physics. All of what I'm going to tell you is an extension of the picture that Nicolas Cerf and I wrote about in 1996, and which Bethe understood immediately after we showed him our results, while it took us six months to understand what he told us. But it is an extension that took some time to clarify, so that the indictment of Bohr (and implicitly Heisenberg) and the collapse picture of measurement is  unambiguous, and most importantly, experimentally verifiable. 

Let's get right into the thick of things. But getting started may really be the hardest thing here. Say you want to measure a quantum system. But you know absolutely nothing about it. How do you write such a quantum system?

In general, people write arbitrary quantum states like this: $|Q\rangle=\sum_i\alpha_i |i\rangle$, with complex coefficients $\alpha_i$ that satisfy $\sum_i|\alpha_i|^2=1$. But you may ask, "Who told you what basis to write this quantum state in? The basis states $|i\rangle$, I mean". After all, the amplitudes $\alpha_i$ only make sense with respect to a particular basis system (if you transform this basis to another, as we will do a lot in this post) it changes the coefficients. "So haven't you already assumed a lot by writing the quantum state like that?" (You may remember questions like that from a blog post on classical information, and this is no accident). 

If you think about this problem for a little while, you realize that indeed the coefficients and the basis you choose are crucial. Just as in classical information theory where I told you that the entropy of a system was undefined, and determined only by the measurement device that you were about to use to learn about it, the state of an arbitrary quantum system only makes sense relative to the quantum states of the detector that you are about to use to measure it. This is, essentially, what is at the heart of the "relative state" formalism of quantum mechanics, due to Everett, of course. That fellow Hugh Everett does not get as much recognition as he deserves, so I'll let you gaze at him for a little while.
H. Everitt III (1930-1982) Source: Wikimedia
He cooked up his theory as a graduate student, but as nobody believed his theory at the time, he left quantum physics and became a defense analyst. 

You may expect me to launch into a description and discussion of the "many-worlds" interpretation of quantum mechanics, which became a fad in the 1970s, but I won't. It is silly to call the relative-state picture a "many-worlds" interpretation, because it does not propose at all that at every quantum measurement event the universe splits into so many worlds as there are orthogonal states. This is beyond silly in fact (it was also not at all advocated by Everett), and the people who did coin these terms should be ashamed of themselves (but I won't name them here). My re-statement of Everett's theory in the modern language of quantum information theory can be read here, and in any case Zeh (in 1973) and Deutsch (in 1985) before me had understood much about Everett's theory without imagining some many-worlds voodoo. 

So let us indeed talk about a quantum state by writing it in terms of the basis states of the measurement device we are about to examine it with. Because that is all we can do, ever. Just as we have learned in the first six installments of this series, we will measure the quantum state using an ancilla A, with orthogonal basis states $|i\rangle_A$. I wrote the 'A' as a subscript to distinguish it from the quantum states, but later I will drop the subscript once you are used to the notation. 

Now look what happens if I measure $|Q\rangle=\sum_i\alpha_i |a_i\rangle$ with A (to distinguish the quantum states, written in terms of A's basis from the A Hilbert space, we simply write them as $|a_i\rangle$). The probability to observe the quantum state in state $i$ is (you remember of course Part 4)
$$p_i=|\langle a_i|i\rangle_A|^2=|\alpha_i|^2.$$ 
Now get this: You're supposed to measure a random state, but the probability distribution you obtain is not random at all, but given by the probability distribution $p_i$, which is not uniform. This makes no sense at all. If $|Q\rangle$ was truly arbitrary, then on average you should see $p_i=1/d$ (the uniform distribution), where $d$ is the dimension of the Hilbert space. So an arbitrary unknown quantum state, written in terms of the basis states of the apparatus that we are going to measure it in, should be (and must be) written as
$$|Q\rangle=\sum_i^d\frac1{\sqrt d} |a_i\rangle.$$
Now, each outcome $i$ is equally likely, as it should be if you are measuring a state that nobody prepared beforehand. A random state. With maximum entropy. 

So now we got this out of the way: We know how to write the to-be-measured state. Except that we assumed that the system $Q$ had never interacted with anything (or was measured by anything) before. This also is a nonsense assumption. All quantum states are entangled: there is no such thing as a "pristine" quantum system. Fortunately, we know exactly how to describe that: we can write the quantum wavefunction so that it is entangled with an arbitrary "reference" state R:
You can think of R as all the measurement devices that Q has interacted with in the past: who are we to say that A is really the first? Now we don't know really what all these R states are, so we just trace them out, so that the Q density matrix is the familiar
$$\rho_Q=\frac1d\sum_i |a_i\rangle\langle a_i|.$$
After we measured the state with A, the joint state QRA is now (the previous posts tell you how to do this)
$$|QRA\rangle=\frac1{\sqrt d}\sum_i |a_i\rangle|r_i\rangle_R|i\rangle_A.      (1)$$
Don't worry about the R system too much: the Q density matrix is still the same as above, and I have to skip the reason for that here. You can read about it in the paper. Oh yes, there is a paper. Read on.

This is, after all, the post about consecutive measurements, so we will measure Q again, but this time with ancilla B, which is not in the same basis as A. (If it was, then the result would be trivial: you'd just get the same result over and over again: it is like all the pieces of the measurement device A all agreeing on the result). 

So we will say that the B eigenstates are at an angle with the A eigenstates:
$$\langle b_j|a_i\rangle=U_{ij}$$
This just means that what is a zero or one in one of the measurement devices (if we are measuring qubits) is going to be a superposition in the other's basis. $U$ is a unitary matrix. For qubits, a typical $U$ will look like this: 
$$U=\begin{pmatrix} \cos(\theta) & -\sin(\theta)\\ \sin(\theta)& \cos(\theta)\\ \end{pmatrix}$$
where $\theta$ is the angle between the bases. (Yes, it is a special case, but it will suffice.)

To measure $Q$ with B (after we measured it with A, of course) we have to write $Q$ in terms of B's eigenstates, and then measure. What you get is a wave function that has Q entangled not only with its past (R), but both A and B as well:
$$|QRAB\rangle=\frac1{\sqrt d}\sum_{ij}U_{ij}|b_j\rangle|i\rangle_R|i\rangle_A|j\rangle_b.       (2)$$
You might think that this looks crazy complicated, but the result is really quite simple. And it agrees with everything that has been written about consecutive measurements so far, whether they advocated a collapse picture or a unitary "relative state" picture. For example, the joint density matrix of just the two detectors, $\rho_{AB}$, is just
$$\rho_{AB}=\frac1d\sum_i|i\rangle\langle i|\otimes\sum_j|U_{ij}|^2|j\rangle\langle j |.$$
That this is the "standard" result will dawn on you when you notice that  $|U_{ij}|^2$ is the conditional probability to measure outcome $j$ with B given that the previous measurement (with A) gave you outcome $i$ (with probability $1/d$, of course).

It is fair warning that if you have not understood this result, you should probably not go on reading. Go on if you must, but remember to go back to this result.

Also, keep in mind that I will from now on use the index $i$ for the system A, the index $j$ for system B, and later on I will use $k$ for system C. And I won't continually indicate the state with a bothersome subscript like $|i\rangle_A$. Because that is how I roll.

So here is what we have achieved. We have written the physics of consecutive quantum measurements performed on the same system in a manifestly unitary formalism, where wavefunctions do not collapse, and the joint wavefunction of the quantum system, entangled with all the measurements that have preceded our measurements, along with our recent attempts with A and B, exists in a superposition, will all the possibilities (realized or not) still present. And the resulting density matrix along with all the probabilities agree precisely with what has been known since Bohr, give or take.

And the whispers of "Chris, what other ways do you know to waste your time, besides I mean, blogging?" are getting louder.

But wait. There is the measurement with C that I advertised. You might think (possibly with anybody who has ever contemplated this calculation) "Why would things change?" But they will. The third measurement will show a dramatic difference, and once we're done you'll know why.

First, we do the boring math. You could do this yourself (given that you followed enough to get to be able to derive Eqs. (1) and (2). You just use a unitary $U'$ to encode the angle between the measurement system C and the system B (just like $U$ described the rotation between systems A and B), and the result (after tracing out the quantum system Q and the reference system R, since no one is looking at those) looks innocuous enough:
$$\rho_{ABC}=\frac1d\sum_i|i\rangle\langle i|\otimes\sum_{jj'}U_{ij}U^{*}_{ij'}|j\rangle\langle j'|\otimes \sum_k U^{'}_{jk}U^{'*}_{j'k}|k\rangle\langle k|.         (3)$$
Except after looking this formula over a couple of times, you squint. And then you go "Hold on, hold on".

"The B measurement!", you exhale. After measuring with B the device was diagonal in the measurement basis (this means that the density matrix was like $|j\rangle \langle j|$). But now you measured Q again, and now B is not diagonal anymore (now it's like $|j\rangle \langle j'|$). How is that possible?

Well, it is the law, is all I can tell you. Quantum mechanics requires it. Density matrices, after all, only tell us part of the story (since you are tracing out the entire history of measurements). That story could be full of lies, and here it turns out it actually is.

It is the last measurement that gives a density matrix that is diagonal in the measurements basis, always. Oh, and the first one, if you measure an arbitrary unknown state. That's two. To see that things can be different, you need a third. The one in between.

To see that Eq. (3) is nothing like what you are used to, let's see what a collapse picture would give you. A detailed calculation using the conventional formalism will lead to (the superscript "coll") is to remind you that this NOT the result of a unitary calculation

$$\rho_{ABC}^{{\rm coll}}=\frac1d\sum_i|i\rangle\langle i|\otimes \sum_j|U_{ij}|^2|j\rangle\langle j|\otimes \sum_k|U_{jk}^{'}|^2|k\rangle\langle k|.       (4)$$

The difference between (3) and (4) should be immediately obvious to you. You get (4) from (3) if you set $j=j'$, that is, if you remove the off-diagonal terms that exist in (3). But, you see, there is no law of physics that allows you to just grab some off-diagonal terms and yank them out of the matrix. That means that (3) is a consequence of quantum mechanics, and (4) is not derived from anything. It is really just wishful thinking.

"So", I can hear you mutter from a distance, "can you make a measurement that supports one or the other of the approaches?  Can experiments tell the difference between the two ways to understand quantum measurement?"

That, Detective,  is the right question. 

How do we tell the difference between two density matrices? Let us focus on qubits here ($d=2$). And, just to make things more tangible, let's fix the angles between the consecutive measurements. 

Measurement A is the first measurement, so there is no angle. In fact, A sets the stage and all subsequent measurements will be relative to that. We will take B at 45 degrees to A. This means that B will have a 50/50 chance to record 0 or 1, no matter whether A registered 0 or 1. Note that A also will record 0 or 1 half the time, as it should in the initial state is random and unknown. 

We will take C to measure at an angle of 45 degrees to B also, so that C's entropy will be one bit as well. Thus, all three detector's entropy should be one bit.  This will be true, by the way, both in the unitary, and in the collapse picture. The relative states between the three detectors are, however, quite different between the two descriptions. Below you can see the quantum Venn diagram for the unitary picture on the left, and the collapse picture on the right. 
Quantum Entropy Venn diagram for the joint and  relative state of three detectors A, B, and C. Detector B measures Q at an angle θ = π/4 relative to the basis of A, and C measures at θ = π/4 relative to the basis of B (from [2]).  

You can first convince yourself that the entropy of each detector is 1 bit in both pictures. You can further convince yourself that the pairwise entropy diagram between any two detectors (tracing out the third) is the same in both pictures. Ordinarily I would leave it to the reader to check this, but here is the result anyway: the pairwise diagram has the entries (1,0,1), meaning that no two detectors share any entropy. 

We kinda knew that had to be like that, on account of the $\pi/4$ angles and all. Yes, the two diagrams look very different. For example, look at detector B. If I give you A and C, the state of B is perfectly known as $S(B|AC)=0)$. That's not true in the collapse picture: giving A or C does nothing for B. 

That in itself looks like a death knell for the unitary picture: How could it be that a past and a future experiment can fully determine the quantum state in the present? It turns out that such questions have been asked before! Aharonov, Bergmann, and Lebowitz (ABL) showed in 1964 that it is possible to set up a measurement so that knowing the results from A and C will allow you to predict with certainty what B would have recorded [1]. As you can tell from the title of their paper, ABL were concerned about the apparent asymmetry in quantum measurement. 

Of course there is an asymmetry! a measurement can tell you about the past, but it cannot tell you about the future! What an asymmetry! 

Slow down, there. That's not a fair comparison. Causality is, after all, ruling over us all: what hasn't happened is different from that which has happened.  The real question is whether, after all things are said and done, there is an asymmetry between what was, and what could have been. In the language of quantum measurement, we should instead ask the question: If the past measurements influence what I can record in the future, do the future measurements constrain what once was, in an equal manner? Or put in another way, can can the measurements today tell me as much about the state on which it was performed, as knowing the state today tells you about future measurements? 

To some extent, ABL answered this question in the affirmative. For a fairly contrived measurement scenario, they showed that if you give me the measurement record of the past, as well as what was measured in the future, I can tell you what it is you must have measured in the present. In other words, they said that the past and future, taken together, will predict the present perfectly.

I don't think everybody who read that paper in 1964 was aware of the ramifications of this discovery. I don't think people are now. What we show in our paper is that what ABL showed holds in a fairly contrived situation, in fact holds true universally, all the time. 

"Which paper?", you ask. "Come clean already!"

Can't you wait just a little longer? I promise it will be at the end of the blog. You can scroll ahead if you must. 

In fact, we show that the ABL result is just a special case that holds quite generally. For any sequence of measurements of the same quantum system, Jennifer Glick and I prove that only the very first and the very last measurements are uncertain. All those measurements in between are perfectly predictable. (This holds for the case of measuring unprepared quantum states only.) This makes sense from the point of view I just advocated: you cannot fully know the last measurement because the future did not yet happen. And you cannot know the first measurement because there is nothing in its past. Everything else is perfectly knowable. 

Now, "knowable" does not mean "known", because in general you cannot use the results of the individual measurements to make the predictions about the intermediate detectors: you need some of the off-diagonal terms of the density matrix, which means that you have to perform more complex, joint measurements. But you only need the measurement devices, nothing else. 

We show a number of other fairly uncommon things for sequences of quantum measurements in the paper aptly entitled "Quantum mechanics of consecutive measurements", which you can read on arXiv here. For example, we show that the sequence of measurements does not form a Markov chain, as is expected for a collapse picture. We also show that the density matrix of any pair of detectors in that sequential chain is "classical", which we here identify with "diagonal in the detector product basis". There are several more general results in there: be sure to read the Supplementary Material, where all the proofs reside. 

"So your math says that wavefunctions don't collapse. Can you prove it experimentally?"

That too is an excellent question. Math, after all, it just a surrogate that helps us understand the laws of nature. What we are saying is that the laws of nature are not as you thought they were. And if you make a statement like that, then it should be falsifiable. If your theory truly goes beyond the accepted canon, then there must be an experiment that will support the new theory (it cannot prove it, mind you) by sending the old theory to where.... old theories go to die. 

What is that experiment? It turns out it is not an easy experiment. Or, at least, for this particular scenario (three consecutive measurements of the same quantum system) the experiment is not easy. The statistics of counts of the three measurement devices is predicted by the diagonal of the joint density matrix $\rho_{ABC}$, and this is the same in the unitary relative state picture and the collapse picture. The difference is in off-diagonal elements of the density matrix. Now, there are methods that allow you to measure off-diagonal elements of a quantum state, using so-called "quantum-state tomography" methods. Because the density matrix in question is large (an 8x8 matrix for qubit measurements), this is a very involved measurement. Fortunately, there are short cuts. It turns out that for the case at hand, every single moment of the density matrix is different. The nth moments of a density matrix is defined by ${\rm Tr} \rho^n$, and it turns out that already the second moment, that is ${\rm Tr \rho^2}$, is different. Measuring the second moment of the density matrix is far simpler than measuring the entire matrix via quantum state tomography, but given that it is a three qubit system, it is still not a simple endeavor. But it is one that I hope someone will be convinced will be worth undertaking. Because it will the experiment that will send the Copenhagen interpretation packing, for all time.

So I asked myself, "How do I close such a long series about quantum measurement, and this interminable last post?" I hope to have brought quantum measurement a little bit out of the obscure corner where it is sometimes relegated to. Much about quantum measurement can be readily understood, and what mysteries there still are can, I am confident, be resolved as well. Collapse never made any physical sense to begin with, but neither did a branching of the universe. We know that quantum mechanics is unitary, and we now know that the chain of measurements is too. What remains to be solved, really, is just the randomness that we experience in the last measurement, when the future is still uncertain.

Where does this randomness come from? What do these probabilities mean?  I have some ideas about that, but this will have to wait for another blog post. Or series.

[1] Y. Aharonov, P. G. Bergmann and J. L. Lebowitz, “Time symmetry in the quantum process of measurement,” Phys. Rev. B 134, 1410–16 (1964).

Saturday, December 26, 2015

Evolving Intelligence ... With a Little Help

The year 2015 may go down in history for a lot of things. Just this December saw a number of firsts: A movie about armed conflict among celestial bodies breaks all records, a rocket delivers a payload of satellites and returns back to Earth vertically, not to mention the politics of the election cycle. But just maybe, 2015 will also be remembered as the year that people started warning about the dangers of Artificial Intelligence (AI). In fact, none other than Elon Musk, the founder of SpaceX who accomplished the improbable feat of landing a rocket, is one of the more prominent voices warning of AI. After giving $10 million to the "Future of Life" Institute (whose mission is "safeguarding life and developing optimistic visions of the future", but mostly warns about the dangers of AI), he co-founded OpenAI, a non-profit research company that aims to promote and develop open-source "friendly AI". 

I wrote about my thoughts on this issue--that is, the dangers of AI--in a piece for the Huffington Post that you can read here. The synopsis (for those of you on a tight reading schedule) is that while I think that it is a reasonable thing to worry about such questions, the fears of a rising army of killing robots are almost certainly naive. The gist is that we are nowhere near creating the kind of intelligence that we should be afraid of. We cannot even design the intelligence displayed by small rodents, let alone creatures that think and plan our demise. 

When discussing AI, I often make the distinction between "Type I" and "Type 2" intelligence. "Type I" is the kind we are good at designing today: the Roomba, Deep Blue, Watson, and the algorithm driving the Google self-driving car. Even the Deep Neural Nets that have been one of the harbingers, it seems, of the newfound fears, squarely belong into this group. These machine intelligences are of Type I (those that you do not need to fear), because they aren't really intelligent. They don't actually have any concept of what it is they are doing: they are reacting appropriately to the input they are presented with. You know not to fear them, because you will not worry about Deep Blue driving Google's car, or Jeopardy-beating Watson to recognize cat videos on the internet. 

Type 2 intelligence is different. Type 2 has representations about the world in which it exists, and uses these representations (abstractions, toy models) to make decisions, plans, and to think about thinking. I have written about the importance of representations in another blog post, so I won't repeat this here. 

If you could design Type 2 intelligence, I would be scared too. But you can't. That is, essentially, my point when I tell people that their fears are naive. The reasons for the failure of the design approach are complex, and detailed in another blog post. You want a synopsis of that one too? Fine, here it is: That stuff ain't modular, and we can only design modular stuff. Type 2 intelligence integrates information at an unheard-off level, and this kind of non-modular integration is beyond our design capabilities, perhaps forever.

I advocate that you cannot design Type 2 intelligence, but you can evolve it. After all, it worked once, didn't it? And that is what my lab (as well as Jeff Clune's lab and now Arend Hintze's lab also) is trying to achieve.  

I know, I know. You are asking: "Why do you think that evolving AI should be less dangerous than designed AI?" This is precisely the question I will try to answer in this post. Along the way, I will shamelessly plug a recent publication where we introduce a new tool that will help us achieve the goal. The goal that we all are looking for--some with trepidation, some with determination and conviction. 

The answer to this question lies in the "How" of the evolutionary approach. To those not already familiar with the evolutionary approach (if this is you: my hat off to you for reading this far), this approach is firmly rooted in emulating the Darwinian process that has given rise to all the biological complexity you can see on our planet. The emulation is called the "Genetic Algorithm".

Here's the "Genetic Algorithm" (GA, for short) for you in a nutshell. Mind you, a nutshell is only large enough to hold but the most basic of stuff about GAs. But here we go. In a GA, a population of candidate "solutions" to a given problem is maintained. The candidates encode the solution in terms of symbolic strings (often called "genotypes"). The strings encode the solution in a way so that small changes to the string give rise to small changes to the solution (mostly). Changes (called mutations) are made to the genotypes randomly, and often strings are recombined by taking pieces of two different strings and merging them. After all the changes are done, the sequences in the new population are tested, and each has a fitness assigned to them. Those with high fitness are given relatively more offspring to place into the next generation, and those with less fitness ... well, less so. Because those types with "good genes" get more representation in the next generation (and those with bad genes barely leave any) over time fitness increases, and complex stuff ensues.

Clearly (we can all attest to that), this is some powerful algorithm. You would not be here reading this without it, because it is the algorithm behind Darwinian evolution, which made you. But as powerful as it is, it also has an Achilles heel. The algorithm preferentially selects types that have higher fitness than the prevailing type. That's the whole point, of course, but what if the highest type is far away, and the path towards it must go through less fit types? In that case, the preference for fitter things is actually an impediment, because you can't tell the algorithm to "stop wanting higher fitness for a little while". 

This problem is known as the "valley-crossing" problem. Consider the fitness landscape in the figure below.
A schematic fitness landscape where elevation is synonymous with fitness.  Credit: Bjørn Østman.
This is known as a "rugged" fitness landscape (for obvious reasons). You are to think of the x and y coordinates of this picture as the genotype, and the z-axis as the fitness of that type. Of course, in realistic landscapes the type is specified by far more than two numbers, but it would not be as easily depicted. Think of the x and y coordinates as the most important numbers to characterize the type. In evolutionary biology, such "important characters" are called "traits". 

If a population occupies one of these peaks, an evolutionary process will have a hard time to make it to another (higher) peak, as the series of changes that the type has to undergo to move to the new peak must lead through valleys. While it is in a valley, it is outcompeted by those types that are not attempting the "trip" to higher ground. Those types that are left behind and stick to the "old ways" of doing things, they are like reactionaries actively opposing progress. And in evolution, these forces are very strong.

What can you do to help the evolutionary algorithm see that it is OK to creep along at lower fitness for a little while? There are in fact many things that can be done, and there are literally hundreds, if not thousands of papers that have been written to address this problem, both in the world of evolutionary computation and in evolutionary biology. It is one of the hottest research fields in evolution.

I cannot here describe the different approaches that have been taken to increase evolvability in the computational realm, or to understand evolvability in the biological realm. There are books about this topic. I will describe here one way to address this problem, in the context of our attempts to evolve intelligent behavior. The trick is to exploit the fact that the landscape really has many more dimensions than the one you are either visualizing, or even the one you are using to calculate the fitness. Let me explain.

In evolutionary computation, you generally specify a way to calculate fitness from the genotype. This could be as simple as "count the number of 1s in the binary string". Such a fitness landscape is simple, non-deceptive (because all paths that lead upwards actually lead to the highest peak) and smooth (there is only one peak). Evolution stops once the string "1111...1111" is found. In the evolution of intelligence, it takes much more to calculate fitness. This is because the sequence, when interpreted, literally makes a brain. That brain must then be loaded onto an agent, who then has to "do stuff" in its simulated world. The better it "does stuff", the higher its score. The higher its score, the higher its fitness. The higher its fitness, the more offspring it will leave in the next generation. And because the offspring inherit the type, the more types in the next generation that can "do stuff". Which is a good thing, as now each one of those has a chance to find out (I mean, via mutations) how to "do even more stuff".

In one of the examples of the paper that I'm actually blogging about the agent has to catch some types of blocks that are raining down, and avoid others. Here's a picture of what that world looks like:

The agent's world. Credit: the authors.
The agent is the rectangular block on the bottom, and it can move left or right. It looks upwards using the four red triangles. Using these "eyes" it must determine whether the block raining down (diagonally, either left or right) is small or large. It it is small it should catch it,  but if it is large it should avoid it. The problem is, the agent's vision is poor: it has a big blind spot between the sensors, so a small and a large block may look exactly the same, unless you move around, that is. That is why this classic task is called "active categorical perception": in order to perceive and classify the shape (which you do by either catching or avoiding), you have to move actively.

This is a difficult problem for the agent, as it takes a little while to determine what the object even is. Once you know what it is, you have to plan your move in such a way that the object will touch you (if it is small) or not touch you (if it is large). This means that you have to predict where it is going to land, and make your moves accordingly. And all that before the brick has hit the floor. You do need memory to pull this off, as without it you will not be able to determine the trajectory.

We have previously shown that you can evolve brains that can do this task perfectly. But this does not mean that every evolutionary trajectory reaches that point. Quite to the contrary: most of the time you get stuck at these intermediate peaks of decent,  but not perfect, performance. We looked for ways to increase those odds, and here's what we came up with. What you want to do is reward things other than the actual performance. Things that you think might make a better brain, but that might not, just at this moment, make you better at the block-catching task. We call these things "neuro-correlates": characters that are correlated with good neurological processing in general. It is like selecting for good math ability when the task at hand is survival from being hunted by predators. Being good at math may not save you right then and there (while being fast would), but in the long run, being good at math will be huge because for example you can calculate the odds of any evasion strategy, and thus select the right one. Math could help you in a myriad of ways. Later on, in another hunt.

After all, the problem with the evolutionary algorithm is its short-sightedness: it cannot "see" the far-off peaks. Selecting for traits that you, the investigator, trust are "good for thinking in general" (the neuro-correlates) is like correcting for the short-sightedness of evolution. The mutations that increase the neuro-correlate traits would ordinarily not be rewarded (until they become important later on). By rewarding them early, you may be able to jump start evolution.

So that is what we tried, in the paper that I'm blogging about, and that appeared on Christmas Day 2015. We tried a litany of neuro-correlates (eight, to be exact). The neuro-correlates that we tested can roughly be divided into two categories: network-theory based, and information-theory based. Since the Markov brains that we evolve are networks of neurons, network-theory based measures make sense. As brains also process information, we should test information-processing measures as well.

The network-based measures are mostly your usual suspects: density of connection (in the expert parlance: mean degree), sparsity, length of longest shortest path, and a not so obvious one: length of genome encoding the network. The information-theoretic ones are perhaps less obvious: we chose information integration, representation, and two types of predictive information. If I would attempt to describe these measures in detail (and why we chose them) I might as well repeat the paper. For the present purpose, let's just assume that they are well defined, and that they may or may not aid evolution.

Which is exactly what we found empirically. Suppose, for example, that you reward (aside from rewarding the catching of the blocks) a measure that quantifies how well you integrate information. There is indeed such a measure: it is called $\Phi$ (Phi), and I blogged about that before.  You can imagine that information integration might be important for this task: the agent has to integrate the visual information from different time points along with other memories to make the decision. So the trick is that any mutation that increases information integration will have an increased chance of making it into the next generation, even though it may not be useful at that moment. So, in other words, we are helping evolution to look forward in time, by keeping certain mutations around even if they are not useful at the time that they occur. Doing this, what may have looked like a valley in the future, may not be a valley after all (because of the presence of a mutation that was integrated into the genome ahead of time).

So what should we reward? Easy, right? Reward those mutations that help the brain work well! Oh wait, we don't know how the brain works. So, we have to make guesses as to what things might make the brain work well. And then test which of these, as a matter of fact, do help in hindsight. Here are the eight that we tested:

Network-theory  based:

1. Minimum Description Length (MDL) (which here you can think of as a proxy for "brain size")
2. Graph Diameter (Longest of all shortest paths between node pairs)
3. Connectivity (the mean degree per node)
4. Sparseness (kind of the opposite of connectivity)

Information-theory based:

5. Representation (having internal models of the world)
6. Information Integration (Phi, the "atomic variant")
7. Predictive information (between sensor states)
8. Predictive Information (between sensor and actuator states)

Here's what we found: Graph diameter, Phi, and connectivity all three significantly help the evolutionary algorithm when the overall rewarded function is the fitness times the neuro-correlate. Sparseness, as well as the two predictive information measures, made things worse. This finding reinforces the suspicion that we really don't know what makes brains work well. In neuroscience, sparse coding is considered a cornerstone of neural coding theory, after all. But we should keep in mind that these findings can very well depend significantly on the type of task investigated, and that for other tasks the findings for what works might be reversed. For example, the block-catching task requires memory, and predictive information is maximized for purely reactive machines. If the task did not require memory, it is likely that predictive information is a good neuro-correlate.

To check how much the value of the neuro-correlate depends on the task chosen, we repeated the entire analysis for a very different task: one that does not even require the agent to have a body.

The alternate task we tested is the ability to generate random numbers using only deterministic rules. That this is a cognitively complex task has been known for some time: the ability to generate random (or, I should say, random-ish) numbers is often used to assess cognitive deficiencies in people. Indeed, if you (a person) were asked to do this, you would need to keep track of not only the last 5-7 numbers you generated (which you can do using short-term memory), but also of how often you have produced doubles, and triples, etc, and of what numbers. The more you think about this task, you appreciate its complexity. And you can easily imagine that different cognitive impairments might lead to different signature departures from randomness.

Of course this task is easy if you have access to a random number generator. but the Markov brains had none. So they had to figure out an algorithm to produce the numbers (which is also what we do in computers to produce pseudo-random numbers).

The results with the random number generation (RNG) task were roughly the same as with the block-catching task: Graph diameter, Phi, and connectivity scored well, while predictive information and sparseness scored negatively. Representation cannot be used as a neuro-correlate for this task, as there is no external world that the brain can create representations of. So while the individual results differ somewhat, there seems to be some universality in the results we found.

Of course, there are very likely better neuro-correlates out there that can boost the performance of the evolutionary algorithm much more. We don't know what these are, as we don't know what it is that makes brains work better. There are many suggestions, and we hope to try some in the future. We can think of

1. Other graph-based measures such as modularity
2. novelty search (rewarding brains that see or do things they haven't seen or done before)
3. conditional mutual information over different time intervals
4. Measures of information transfer
5. Dual total correlation

Of course, the list is endless. It is our intuition about what matters in computation in the brain that should guide our search for measures, and whether or not they matter is then found empirically. In this manner, evolutionary algorithms might also give us a clue about how all brains work, not just those in silico.

But I have not yet answered the question that I posed at the very beginning of this post. Most of you are forgiven for forgetting it as it is figuratively eons ago (or 36 paragraphs, which in blogging land is considered almost synonymous with eons). The straw man reader asked: "What makes you think that evolution (as opposed to design) will produce "nice" intelligences, that is, the kind that will not be bent on destruction of all of humanity?"

The answer is that we cannot (we firmly believe that) evolve intelligence in a vacuum. The world in which intelligence evolves must be complex, and difficult to predict. Thus, it must change in subtle ways, ways that takes intelligence to forecast. The best world to achieve this is a world in which there are other agents, with complex brains. Then, prediction requires the prediction of behaviors of others, which is best achieved by understanding the other. Perhaps, by realizing that the other thinks like you think. When doing this, you generally also evolve empathy. In other words, as we evolve our agents to survive in groups of other agents, cooperative behavior should ultimately evolve at the same time.

Our robots, when they first open their eyes to the real world, will already know what cooperation and empathy are. These are not traits that human programmers are thinking of, but evolution has stumbled upon these adaptive traits over and over again. That is why we are optimistic: we will be evolving robots with empathic brains. And if they show signs of psychopathology in their adolescence? Well, we know where the off switch is.

The publication this blog post is based on is open access (gold):

J. Schossau, C. Adami, and A. Hintze, Information-Theoretic Neuro-Correlates Boost Evolution of Cognitive Systems. Entropy 18 (2016) e18010006.