Sunday, May 19, 2013

Where do thinking machines come from?

We've been waiting for these thinking machines for a long time now. We've read about them, and seen them in countless movies. They are just technology, right? And we've gotten really good at this technology thing! But where are the machines?

In a previous post I've hinted at the big problem in serious Artificial Intelligence (AI) research: if the theory of consciousness based on the concept of integrated information is right, then thinking machines are essentially undesignable.

Mind you, we do have smart machines. We have machines that outperform humans in playing chess, we have self-driving cars that process close to 1Gbit per second of data, and we have machines that can beat pretty much anybody at Jeopardy! But neither you, or I, would call these smart machines intelligent. We do not take that word lightly: if you're just good at doing one particular job, then you're smart at that, but you are not intelligent. Google's car cannot play chess (nor can Watson), and neither Big Blue nor Watson should be allowed behind the wheel of a car.

What's going on here?

Here's the most important thing you need to know about what it takes to be intelligent. You have to be able to create worlds inside your brain. Literally. You have to be able to imagine worlds, and you have to be able to examine these worlds. Walk around in them, linger.

This is important because you live in this world, the one you are also imagining. This world is complex, it is dangerous, and it is often unpredictable. It is precisely this unpredictability that is dangerous: you can be lunch if you don't understand the tell-tale signs of the lurking tiger.

Yes I know, your chances of being eaten by a tiger are fairly low, but I'm not talking about today: I'm talking about the time when we (as a species) "grew up", that is, when we came down from the trees and ventured into the open fields of the savannah. To survive in this world, we have to make accurate predictions about the future state of the world. (Not just in the next five minutes, but also on the scale of months, seasons, years.)

How do we make these predictions? Why, we imagine the world, and in our minds imagine what happens. These imaginings, juxtaposed with the things that really do happen, allow us to hone a very important skill: we can represent an abstract version of the world in our heads, and use it to understand it. Understanding means removing surprises, the things that usually kill you.

Thinking about an object thus means creating an abstract representation of this object in your head, and playing around with it. If you can't do that, then you cannot think. You cannot be intelligent.

Are workers in the field of Artificial Intelligence oblivious about this absolutely crucial, essential aspect of intelligence?

Absolutely not. They are perfectly aware of it. In the heydays of AI research, that's pretty much all people did: they tried to cram as many facts about the real world into a computer's memory as they could. This, by the way, is still pretty much the way Watson is programmed, but he has a smarter retrieval system than what was possible in those days, based on Bayesian inference.

But in the end, the programmers had to give up. No matter how much information they crammed into these brains, this information was not integrated: it did not produce an impression of the object that allowed the machine to make new inferences about the object that were not already programmed in. But that is precisely what is needed: your model of the world has to be good enough so that (when thinking about it)  you can make predictions about things you didn't already know.

So what did AI researchers do? Some gave up, and left the field. Others decided that they could do without these pesky imagined worlds. That you could create intelligence without representation. (The linked article is available beyond the paywall all over the internet, for example here. Tells you something about paywalls.) NOTE: This was available until recently! Also tells you something about paywalls.

Given all that I just told you, you ought to at least be baffled. It all seemed so convincing! You can do without internal models? How?

The idea that you could do away with representations for the purpose of Artificial Intelligence is due to Rodney Brooks, then Professor of Robotics at MIT. Brooks is no slouch, mind you. His work has influenced a generation of roboticists. But he decided that robots should not make plans, because, well, the best laid plans, you know....

Rather Brooks argued that robots should react to the world. The world already contains all the complexity! Why not use that? Why program something that you have direct access to?

Why indeed? Brooks was quite successful with this approach, creating reactive robots with a subsumption architecture. Reactive robots are indeed robust: they can act appropriately given the current state of the world, because they take the world seriously: the world is all they have.

But I think we can all agree that these robots, agile as they are, won't ever be intelligent. They won't be able to make plans. Because plans require good internal models, which we don't know how to program.

So where will our intelligent machines come from?

The avid reader of Spherical Harmonics (should such a person actually exist), already knows the answer to this question. Evolution is the tool to create the undesignable! If you can't program it, evolve it! After all, that's where we came from.

Now, I've hinted at this before: evolve it! Can you actually evolve representations?

Yeah, we can, and we've shown it. And there is a paper that just came out in the journal Neural Computation that documents it. That's right, you've been reading a blog post that is an advertisement for  a journal article that is behind a pay wall!

Relax, there is a version of the article on the AdamiLab web site. Or go get it from arxiv.org here

Now back to the specifics: "You've evolved representations, you say? Prove it!"

Ah! Now, a can of worms opens.  How can you show that any evolved anything actually represents the world inside its.... bits? What are representations anyway? Can you measure them?

Now here's a good question. It's the question the empiricist asks, when he is entangled in a philosophical discussion. And lo and behold, the concept of representation is a big one in the field of philosophy. Countless articles have been written about it. I'm not going to review them here. I have this sneaking suspicion that I am, again, engaged in writing an overly long blog post. If you're into this sort of thing (reading about philosophy, as opposed to writing overly long blog posts), you can read about philosophers talking about representation here, for example. Or here. I could go on.

Philosophers have defined "representation" as something that "stands in" for the real thing. Something we would call a model. So we're all on the same wavelength here. But can you measure it? What we have done in the article I'm blogging about, is to propose an information-theoretic measure for how much a brain represents. And then we evolve a brain that must represent to win, and measure that thing we call representation. But then we go one better: we also measure what it is that these brains represent.

We literally measure what these brains are thinking about when they make their predictions.

How do we do that? So, first of all, we understand that when you represent something, then this something must be information. Your model of the world is a compressed representation of the world, compressed into the essential bits only. But importantly, you're not allowed to get those bits from looking at the world. Staring at it, if you will. If you have a model of the world, you can have that model with your eyes closed. And ears. All sensors. Because if you could not, you would just be a reactive machine. So, a representation is those bits of the worlds that you can't see in your sensors. Can you measure that?

Hell yes! Claude Shannon, that genius of geniuses, taught us how! Here is the informational Venn diagram between the world (W), the sensors (S) that see the world (they represent it, albeit in a trivial manner),  and the Brain (B):

What we call "representation" (R) is the information that the brain knows about the world (information shared between W and B) given the sensor states (S). "Given", in the language of information theory, means that these states (the sensor states) do not contribute to your uncertainty. It also means that the "given" states do not contribute to the information (shared entropy) between W and B. That's why the "intersection triangle" between W, B, and S does not contribute to R: we have to subtract it because it also belongs to S. (I will talk about these concepts in more detail in part 2 of my "What is Information? series) So, R is what the brain knows about the world without sneaking a peek at what the world currently looks like in the sensors. It is what you truly know.

Now that we have defined representation quantitatively (so that we can measure it), how does it evolve?

Splendidly, as you may have surmised. To test this, we designed a task (that a simulated agent must solve) that requires building a model of the task, in your brain. This task is relatively simple: you are a machine that catches blocks. Blocks rain down from the sky (falling diagonally) but there are two kinds of blocks in the world. Small ones (that you should definitely catch) and large ones (that you should definitely avoid). To make things interesting, your vision is kind of shoddy. You have a blind spot in the middle of your retina, so that a big block may look like a small bock (and vice versa), for a while.

In this image, a large block is falling diagonally to the left. This is a tough nut to crack for our agent, because he hasn't even seen it. He is moving in the right direction (perhaps by chance) but once the block appears in the agent's sensors, he has to make a decision quickly. You have to determine both size, direction of motion, and relative location (is the block to my left? right above me? to my right?) You have to integrate several informational streams in order to "recognize" what you are dealing with. And the agent'a actions will tell us whether he has "understood" what it is what he is dealing with. That's what makes this task cool.

We can in fact evolve agents that solve this task perfectly, that is, they determine the right move for each of the 80 possible scenarios. Why 80? Well the falling block can be in 20 different positions at the top row. It can be small or large. It can fall to the left or to the right: 20 x 2 x 2 = 80. You say that I'm neglecting the 20 possible relative positions of the catcher? No I'm not: because the game "wraps" in the horizontal direction. Then if the block falls off the screen from the left, it reappears, as if by magic, on the left. The agent also reappears on the left/right if he disappears on the right/left.  As a consequence, we only have to count the 20 relative positions between falling block and catching agent.

As the agents become more proficient at catching (and avoiding) blocks, our measure R increases steadily. But not only can we measure how much of this world is represented in the agent's brain, we can literally figure out what they are thinking about!

Is this magic?

Not at all, it is information theory. The way we do this, is by defining a few (binary) concepts that we think may be important for the agent, such as:

Is the block to my left or to my right?
Is the block moving left or right?
Is the block currently triggering one of my sensors?
Is the block large or small?

Granted, the world itself can be in 1,600 different possible states. (Yes, we counted). These 4 concepts only cover two to the power of 4, or 16 possible states. But we believe that the agent may want to think about these four concepts in order to come to a decision; that these are essential concepts in this task.

Of course, we may be wrong.

But we can measure which of the twelve neurons encode each of the four concepts, and we can even determine the time when they have become adapted to this feature. So, do the agents pay attention to these four concepts as they learn how to make a living in this world?

Not exactly, actually. That would be too simple. These concepts are important to a bunch of neurons, to be sure. But it is not like a single neuron evolves to pay attention to "big or small" while another tells the agent whether the brick is moving left or right. Rather, these concepts are "smeared" across a bunch of neurons, and there is synergy between concepts. Synergy means that if two (or more) neurons are encoding a concept together synergistically, then together they have more information about it then summing up the information that each one has by itself.

So what does all of this teach us?

It means (and of course I'm biased here), that we have learned a great deal about representation here. We can measure how much a brain represents about its world within its states information-theoretically, and we can (with some astute guessing) even spy on what concepts the brain uses to make decisions. We can even see these concepts form as the brain is processing the information. At the first time step, the brain is pretty much clueless: what it sees can lead to anything. After the second time step, it can rule out a bunch of different scenarios, and as time goes by, the idea of what the agent is looking at forms. It is a hazy picture at first, for sure. But as more and more information is integrated, the point in time arrives where the agent's mental image is crystal clear: THIS is what I'm dealing with, and this is why I move THAT way.

It is but a small step, for sure. Do brains really work like this? Can we measure representation in real biological brains? Figure out what an organism thinks about, and how decisions are made?

If any of our information theory is correct, it is just a matter of technology to get the kind of data that will provide answers to these questions. That technology is far from trivial. In order to determine what we know about the brains that we evolve, we have to have the time series of neuronal firing (000010100010 etc) for all neurons, for a considerable amount of time (such as, the entire history of experiencing all 80 experimental conditions).  That's fine for our simple little world, but it not at all OK for any realistic system. Obtaining this type of resolution for animals is almost completely unheard of. Daniel Wagenaar (formerly at Caltech and now at the University of Cincinnati) can do this for 400 neurons in the ganglion of the medicinal leech. Yes, the thing seen on the left. Don't judge, it has very big neurons!

And, we are hoping to use Daniel's data to peer into the leech's brain, see what it is thinking about. We expect that food and mating are the variables we find. Not very original, I know. But wouldn't that be a new world? Not only can we measure how much a brain represents, we can also see what it is representing! As long as we have any idea about what the concepts could be that the animals are thinking about, that is.

I do understand, from watching current politics, that this may be impossible for humans. But yet, we are undeterred!

Article reference: L. Marstaller, A. Hintze, and C. Adami. (2013). The evolution of representation is simple cognitive networks. Neural Computation 25.

Thursday, April 25, 2013

What is Information? (Part I: The Eye of the Beholder)

Information is a central concept in our daily life. We rely on information in order to make sense of the world: to make "informed" decisions. We use information technology in our daily interactions with people and machines. Even though most people are perfectly comfortable with their day-to-day understanding of information, the precise definition of information, along with its properties and consequences, is not always as well understood. I want to argue in this series of blog posts that a precise understanding of the concept of information is crucial to a number of scientific disciplines. Conversely, a vague understanding of the concept can lead to profound misunderstandings, within daily life and within the technical scientific literature.  My purpose is to introduce the concept of informationmathematically defined—to a broader audience, with the express intent of eliminating a number of common misconceptions that have plagued the progress of information science in different fields.

What is information? Simply put, information is that which allows you (who is in possession of that information) to make predictions with accuracy better than chance. Even though the former sentence appears glib, it captures the concept of information fairly succinctly. But the concepts introduced in this sentence need to be clarified. What do I mean with prediction? What is "accuracy better than chance"? Predictions of what?

We all understand that information is useful. When is the last time that you have found information to be counterproductive? Perhaps it was the last time you watched the News. I will argue that, when you thought that the information you were given was not useful, then what you were exposed to was most likely not information. That stuff, instead, was mostly entropy (with a little bit of information thrown in here or there). Entropy, in case you have not yet come across the term,  is just a word we use to quantify how much you don't know. Actually, how much anybody doesn't know. (I'm not just picking on you).

But, isn't entropy the same as information?

One of the objectives of these posts is to make the distinction between the two as clear as I can. Information and entropy are two very different objects. They may have been used synonymously (even by Claude Shannon—the father of information theory—thus being responsible in part for a persistent myth) but they are fundamentally different. If the only thing you will take away from this article is your appreciation of the difference between entropy and information, then I will have succeeded.

"Why on Earth introduce that complication?", you ask.

Well, think of it this way. Let's quantify your uncertainty (that is, how much you don't know) about a system (System One) by the number of states it can be in. Say this is $N_1$. Imagine that there is another system (System Two), and that one can be in $N_2$ different states. How many states can the joint system (System One And Two Combined) be in? Well, for each state of System One, there can be $N_2$ number of states. So the total number of states of the joint system must be $N_1\times N_2$. But our uncertainty about the joint system is not $N_1\times N_2$. Our uncertainty adds, it does not multiply. And fortunately the logarithm is that one function where the log of a product of elements is the sum of the logs of the elements. So, the uncertainty about the system $N_1\times N_2$ is the logarithm of the number of states
$$H(N_1N_2)=\log(N_1N_2)=\log(N_1) + \log(N_2).$$
I had to assume here that you knew about the properties of the log function. If this is a problem for you, please consult Wikipedia and continue after you digested that content.

Phew, I'm glad we got this out of the way. But, we were talking about a six-sided die. You know, the type you've known all your life. What you don't know about the state of this die (your uncertainty) before throwing it is $\log 6$. When you peek at the number that came up, you have reduced your uncertainty (about the outcome of this throw) to zero. This is because you made a perfect measurement. (In an imperfect measurement, you only got a glimpse of the surface that rules out a "1" and a "2", say.)

What if the die wasn't fair? Well that complicates things. Let us for the sake of the argument assume that the die is so unfair that one of the six sides (say, the "six") can never be up. You might argue that the a priori uncertainty of the die (the uncertainty before measurement) should now be $\log 5$, because only five of the states can be the outcome of the measurement. But how are you supposed to know this? You were not told that the die is unfair in this manner, so as far as you are concerned, your uncertainty is still $\log 6$.

Absurd, you say? You say that the entropy of the die is whatever it is, and does not depend on the state of the observer? Well I'm here to say that if you think that, then you are mistaken. Physical objects do not have an intrinsic uncertainty. I can easily convince you of that. You say the fair die has an entropy of $\log 6$? Let's look at an even more simple object: the fair coin. Its entropy is $\log 2$, right? What if I told you that I'm playing a somewhat different game, one where I'm not just counting whether the coin comes up heads to tails, but am also counting the angle that the face has made with a line that points towards True North. And in my game, I allow four different quadrants, like so:

Suddenly, the coin has $2\times4$ possible states, just because I told you that in my game the angle that the face makes with respect to a circle divided into 4 quadrants is interesting to me. It's the same coin, but I decided to measure something that is actually measurable (because the coin's faces can be in different orientation, as opposed to, say, a coin with a plain face but two differently colored sides). And you immediately realize that I could have divided the circle into as many quadrants as I can possibly resolve by eye.

Alright fine, you say, so the entropy is $\log(2\times N)$ where $N$ is the number of resolvable angles. But you know, what is resolvable really depends on the measurement device you are going to use. If you use a microscope instead of your eyes, you could probably resolve many more states. Actually, let's follow this train of thought. Let's imagine I have a very sensitive thermometer that can sense the temperature of the coin. When throwing it high, the energy the coin absorbs when hitting the surface will raise the temperature of the coin slightly, compared to one that was tossed gently. If I so choose, I could include this temperature as another characteristic, and now the entropy is $\log(2\times N\times M)$, where $M$ is the number of different temperatures that can be reliably measured by the device. And you know that I can drive this to the absurd, by deciding to consider the excitation states of the molecules that compose the coin, or of the atoms composing the molecules, or nuclei, the nucleons, the quarks and gluons?

The entropy of a physical object, it dawns on you, is not defined unless you tell me which degrees of freedom are important to you. In other words, it is defined by the number of states that can be resolved by the measurement that you are going to be using to determine the state of the physical object. If it is heads or tails that counts for you, then $\log 2$ is your uncertainty. If you play the "4-quadrant" game, the entropy of the coin is $\log 8$, and so on. Which brings us back to six-sided die that has been mysteriously manipulated to never land on "six". You (who do not know about this mischievous machination) expect six possible states, so this dictates your uncertainty. Incidentally, how do you even know the die has six sides it can land on? You know this from experience with dice, and having looked at the die you are about to throw. This knowledge allowed you to quantify your a priori uncertainty in the first place.

Now, you start throwing this weighted die, and after about twenty throws or so without a "six" turning up, you start to become suspicious. You write down the results of a longer set of trials, and note this curious pattern of "six" never showing up, but the other five outcomes with roughly equal frequency. What happens now is that you adjust your expectation. You now hypothesize that it is a weighted die with five equally likely outcome, and one that never occurs. Now your expected uncertainty is $\log 5$. (Of course, you can't be 100% sure.)

But you did learn something through all these measurements. You gained information. How much? Easy! It's the difference between your uncertainty before you started to be suspicious, minus the uncertainty after it dawned on you. The information you gained is just $\log 6-\log5$. How much is that? Well, you can calculate it yourself. You didn't give me the base of the logarithm you say?

Well, that's true. Without specifying the logarithm's base, the information gained is not specified. It does not matter which base you choose: each base just gives units to your information gain. It's kind of like asking how much you weigh. Well, my weight is one thing. The number I give you depends on whether you want to know it in kilograms, or pounds. Or stones, for all it matters.

If you choose the base of the logarithm to be 2, well then your units will be called "bits" (which is what we all use in information theory land). But you may choose the Eulerian e as your base. That makes your logarithms "natural", but your units of information (or entropy, for that matter) will be called "nats".  You can define other units (and we may get to that), but we'll keep it at that for the moment.

So, if you choose base 2 (bits), your information gain is $\log_2(6/5)\approx 0.263$ bits. That may not sound like much, but in a Vegas-type setting this gain of information might be worth, well, a lot. Information that you have (and those you play with do not) can be moderately valuable (for example, in a stock market setting), or it could mean the difference between life and death (in a predator/prey setting). In any case, we should value information.

As an aside, this little example where we used a series of experiments to "inform" us that one of the six sides of the die will not, in all likelihood, ever show up, should have convinced you that we can never know the actual uncertainty that we have about any physical object, unless the statistics of the possible measurement outcomes of that physical object are for some reason known with infinite precision (which you cannot attain in a finite lifetime). It is for that reason that I suggest to the reader to give up thinking about the uncertainty of any physical object, and be only concerned with differences between uncertainties (before and after a measurement, for example).

The uncertainties themselves we call entropy. Differences between entropies (for example before and after a measurement) are called information. Information, you see, is real. Entropy on the other hand: in the eye of the beholder.

In this series on the nature of information, I expect the next posts to feature more conventional definitions of  entropy and information—meaning, those that Claude Shannon has introduced—(with some examples from physics and biology), then moving on to communication, and the concept of the channel capacity.

Stay tuned!

Sunday, April 7, 2013

The evolution of the circle of empathy

What is the circle of empathy? Empathy, as we all know, is the capacity to feel (or at least recognize) emotions in other entities that have emotions. Many people believe that this capacity is in fact shared by many types of animals. The "circle of empathy" is  a boundary within which each individual places the things he or she empathizes with. Usually, this only includes people and possibly certain animals, but is unlikely to include inanimate objects, and very rarely plants or microbes. This circle is intensely personal, however. (Psychopaths, for example, seem to have no circle of empathy whatsoever.) Incidentally, I thought I had invented the term, but it turns out that Jaron Lanier has used it before me in a similar fashion, as has the bioethicist Peter Singer. What I would like to discuss here is the evolution of our circle of empathy over time, what this trend says about us, and think about where this might lead us in the long run.

When we go way, way back in time, life was different. There wasn't what we now call "society", or even "civilization". There were people, animals, and plants. And there was the sun rising predictably in the morning, and setting in the evening just as expected. But everything else was less predictable. Life was "fraught with perils" (as a lazy writer would write). Life was uncertain. What is the best mode of survival in this world?

"Trust no-one", the X-files may exhort you, but in truth, you've got to trust somebody. The life (and survival) of the Lone Ranger is not predicated on loneliness; he too must rely on the kindness of strangers and companions.  Life is more predictable when you can trust. But who do you trust, then? Of course, you trust family first: this is the primal empathic circle: you feel for your family, and expect they feel for you. Emotions are almost sure-fire guarantors of behavior. From this point of view, emotions protect, and make life a little more predictable.

As we evolve, we learn that expanding the circle of empathy is beneficial. When it comes to protecting the family, as well as the things we have gained, it is beneficial to gang up with those that have an equal amount to lose. "Let us forge a brand of brothers that is not strictly limited to brothers and sisters; we who defend the same stake, let us stand as one against those that thrive to tear us down".

Thus, through ongoing conflicts, new bonds are forged. We may not be related in the familiar manner, but we are alike, and our costs and benefits are aligned. The circle of empathy has widened.

Time, relentlessly, goes on. And the circle of empathy inevitably widens (on average). Yes, don't get me wrong, I fully understand that human history is nothing but a wildly careening battle between the forces that compel us to love our fellow man, and the urge to destroy those who are perceived to interfere with our plans of advancement. Throughout history, the circle of empathy may widen for a while, then restrict. People perceived to be different  (often, in fact, perceived as inferior) may be admitted to the circle for some (sometimes even most), but just as often dismissed. Yet, over time, the circle appears to inexorably widen.

There is no doubt about this trend, really.  From the family, the circle expanded to encompass the clans that were probably closely related. From those, the circle expanded to cities, city states, and finally countries. At this point it was just a matter of time until humans expressed their empathy with respect to all humankind. "We are all one", the idealist would invariably exclaim (mindful that not everyone on Earth has evolved to be quite as magnificent, or magnanimous).  Our many differences aside, the widening of the circle of empathy is palpable. The tragedy of September 11th 2001, for example, was genuinely felt to be a tragedy by the majority of people on the globe.

It is also clear that the evolution of the circle's radius proceeds by a widening in a few individuals first, who then spend a good portion of their lives convincing their fellow humans that they ought to widen their circles just as much. Civil rights struggles and equal rights campaigns can be subsumed this way. Anti-abortion crusaders would like everyone to include the unborn fetus into their circle of empathy. Many vegetarians have chosen not to eat meat for the simple reason that they have included all animals within their circle of empathy.

Given that the dynamics of the widening of the circle on average is driven by a few pioneers who widened theirs ahead of everyone else, how far should we expect to widen our own circles? For example, I am not a vegetarian. I do empathize with animals, but like most people I know, my empathy has its limits. I generally do not kill animals, but when insects find their way into my house I consider that a territorial transgression. Given the nervous system of most insects, it is unlikely that they perceive pain in any manner comparable with how we perceive it. And this is probably the line of empathy that will most likely be drawn by the majority of people at some point in the future: if animals can perceive pain just as we do, then we are likely to include them into our circle. The more complex they are cognitively, the more likely we would have them in our circle. The trouble is, the cognitive complexity of animals isn't easily accessible to us. We empathize with the great apes (the group of primates that, besides the gorilla, chimpanzee, and organgutan, also includes us) in part because they are so similar to us. But cetaceans (the group of animals that includes whales and dolphins) have at least as complex  a cognitive system as the great apes, but appear on far fewer people's radar.

Bottlenose dolphin. (Source: Wikimedia)

The neuroscientist Lori Marino, for example (who together with Diana Reiss first published evidence that bottlenose dolphins can recognize themselves in a mirror) has been pushing for the ethical treatment of cetaceans (and therefore for a widening of our circle of empathy to include cetaceans) using scientific arguments based both on behavioral as well as neuroanatomical evidence. She (as well as people like the lawyer Steven Wise) have been pushing for "non-human legal rights" for certain groups of animals, thus enshrining the widened circle into law. From this point of view, the recent analysis of the methods used by Japanese dolphin hunters to round up and kill dolphins is another stark reminder of how different the radius of the circle can be among fellow humans (and how culture and ethnic heritage affects it).

All this leads me back to a thought I have touched upon in a previous post: if higher cognitive capacities are associated with things we call "consciousness" and "self-awareness", perhaps we need to be able to better capture them mathematically, and therefore ultimately make them measurable. If we were to achieve this, then we may end up with a scale that gives us clear guidelines on what the radius of our circle of empathy should be, rather than waiting for more enlightened people to show us the path. It is unlikely that this circle will encompass plants, microbes, or even insects. But there are surely animals out there who, on account of their not being able to talk to us, have been suffering tremendously. Looking at this from the vantage point of our future more enlightened selves, we should really figure out how to draw the line, somehow, sooner better than later. I don't know where that line is, but I'm pretty sure that my line will evolve in time, and yours will too.

Tuesday, March 26, 2013

Darwinian civilization

"Nature is red in tooth and claw", Charles Darwin is often quoted as writing (even though this turn of phrase actually stems from a poem by Alfred Tennyson, mourning the loss of his friend Arthur Hallam, a poem that appeared before the publication of Darwin's "Origin of Species"). Be that as it may, the phrase is designed to make us appreciate nature's cruelty, that thousands have to die for the rare variant to ascend to greater fitness, that progress must be purchased at the expense of unfathomable suffering. Evolution, we learn to understand,  is a bleak process, devoid of compassion, in a winner-take-all world.

What does this say about us, the product of this process? Our genes were shaped for eons by teeth and claws; our purpose in life is to reproduce faster than the other guy (and prevent him from doing the same), is it not? Should we not capitulate to these tendencies bred into us via the eternal survival of the fittest, and realize that the meek and the poor are very unlikely to inherit the earth?

My view is that this is an altogether unevolved view of humanity. What is it about us humans that is remarkable, that is worth a pause? Is it our ability to think, plan, use tools, to create art? I'm afraid that if you think that this is purely our domain, you should think again: animals can do this too, they can even paint portraits.

This is not what makes us special. What makes us special is precisely our ability to resist the Darwinian drive. People have risen above animals the moment they created civilizations. You may feel like discussing what I mean by civilization, but my meaning is really quite pedestrian: a civilization is any organization or group of humans that engages in a division of labor, and protects its members from outside groups. From this point of view, a civilization is more than an extension of the "empathic circle" beyond the close-knit kin group. Civilization also encompasses cooperation (division of labour is a form of cooperation) and in particular protection. One of the distinguishing features of a civilization is, in my view, its anti-Darwinian tendencies: to protect from elimination those that can't protect themselves. I believe that if there is any nobility in humankind it is that: empathy for fellow humans we are only distantly related to; to care for people without asking for a return, simply because it is the human thing to do.

I also understand that not everybody shares my views concerning the value of civilization. I realize that there are fellow humans that think we ought to return to a more Darwinian society where the strong rule the weak, and where the meek (by virtue of having chosen to be meek) should reap the genetic consequences of defeat. In this view of life, there is no place for losers.

But, may I offer a counterargument to this, shall I call it: "The Tea Party reads Darwin for the First Time" view?

What follows may become a bit technical, so I'm afraid I may lose some of my Tea Party readers. But I do encourage you all to hang on.

"Nature, red in tooth and claw" emphasizes the strength and brutality of selection, but by itself selection cannot create progress. Progress, defined here as "increasing the fit" of an organism to its environment, requires variation. You immediately see that this is true: if all organism are identical (genetically and phenotypically, that is, in their appearance), then no amount of selection will help you if the environment changes, for example. We would all be doomed (identically so) if the new world is inhospitable to us. Because we would all be screwed in the same manner. Progress, in the light of a changing environment, can only come about if there is diversity. What if you and I are different enough so that I cannot cope with the changed environment, but you—as it turns out—can? Then you will found the lineage that will inherit the Earth. Because you were different. And I was not.

That this diversity—or variation as it is more properly called within the field of population genetics (the mathematical formulation of Darwinism)—is important for adaptation is an old hat (so to speak), commonly going by the moniker "Fisher's Fundamental Theorem" of evolution. What is less well-known is how populations go about maintaining the necessary variation in populations to assure that they can adapt when times are a-changin'. Because here it is: if what makes us who we are is determined largely by our genes, then maintaining diversity requires maintaining a diversity of genes. But what if an individual possesses a gene that is just one change away from fantastic, but in the absence of that change is rather dull, or worse, inferior? This individual, one change away from greatness (and founding the other lineage that will inherit the Earth) is vulnerable. It is meek. It is unprotected. The tooth and the claw will likely eliminate it so that its (potential) greatness will never be revealed. Darwinian dynamics is cruel for sure, but also sometimes shortsighted. Couldn't we do better? Is it possible that by protecting the weak we actually foster the type of "valley-crossing" events that Darwinian evolution has a hard time to effect, but relies on for the occasional fundamental change?

Perhaps the answer is "Yes, we can". But, what is the cost of keeping around the meek? There is a cost, for sure. The meek are plentiful, because there are more ways to diminish genes than there are to improve them. The potential, however, is boundless. Within the field of evolutionary biology, we practitioners spend countless hours to understand the mechanisms that molecular biology uses to increase and maintain variation: recombination, negative frequency-dependent selection, increased mutation rate, linkage, and much more. But we compassionate humans can transcend molecular mechanisms: we can maintain diversity simply because we believe in giving people a chance, that every person on Earth has the right to attempt to realize their dreams and passions, to "live a healthy and productive life". And just perhaps, keeping around the "tired, the poor, the huddled masses yearning to breathe free" may be the thing to ensure the survival of the truly evolved species, one that understands that the "wretched refuse of your teeming shores" may very well constitute the genetic key to survival in tomorrow's changed world.

So, could it be that a compassionate and altruistic civilization could actually transcend Darwinian dynamics by outwitting the demon of selection, so as to allow an unheard-of level of valley crossings? It would be fitting, wouldn't it, given that the segment of the (U.S.) population that most argues for promoting the survival of only the fittest in human civilization, is precisely the one that has the most problems with the science of evolution.

Disclaimer: This blog post may or may not have been influenced by the Adami Lab's emphasis on understanding the features of fitness landscapes that make valley crossings a fundamental feature of Darwinian adaptation.

Friday, March 22, 2013

Oh these rascally black holes! (Part 3)

Fortunately, at that point I already know how to calculate the capacity of quantum channels, because I was involved in this endeavor during what is now known as the "heydays" of quantum information theory. I knew that in order to calculate the capacity for the black hole to transmit classical information, I had to calculate the shared entropy between a "preparer" and the radiation observed at future infinity. The  preparer creates the physical quantum states that are to be used as signals (our particles and anti-particles) according to a list of symbols. So, if you want to send "0010011", then the preparer sends "ppappaa", where "p" stands for particle and "a" for anti-particle. The entropy of the preparer is just the entropy of the symbols she sends. Fairly quickly, I realize that the shared entropy has just the form that appears in Holevo's theorem. At that point, I see that it is all over, because the capacity of the black hole channel is just the Holevo capacity, as it should be. And it is also clear that if there is no stimulated emission, then the capacity is exactly zero and we have to look for someone or something to sue.

But now (back in 2004) we hear that Hawking is going to give a talk in Ireland where he will announce that he solved the black hole information paradox. He will announce (we hear), that he was wrong all along, that information is preserved in black holes. Greg and I are dumbfounded. Has he figured this out at the same time as we did? I start writing up our results like there is no tomorrow, but can't finish until a day after Hawking gives his talk. And we read the reports, and exhale. His "solution" has nothing to do with ours, and many physicists are very skeptical whether it is a solution at all.

At almost exactly this point, Charles Seife from Science Magazine calls me to comment on Hawking's "discovery", and I explain to him my thoughts, but can't hold back my excitement about what we found. That's the history of  my comments in the article he wrote here.

So what happens now? Now comes a period where we submit our paper to Physical Review Letters, and fight with referees for two years.

But we also realize that what the black hole is doing by stimulating the emission of radiation is to act like a quantum cloning machine, and that we should calculate the cloning fidelity.  This we do, and the results are incredible. First, we notice that the mathematics of cloning is exactly like that at work in  stimulated emission in quantum optics, and that just as in the case of quantum optics, the fidelity of cloning is nearly optimal! Well, it is if the black hole can reflect a little bit of radiation. If it absorbs everything (and most people know that black holes don't necessarily absorb all radiation, it depends on the angular momentum of the incoming particle) then the fidelity of cloning is equal to the best you can do classically, namely classical state estimation. (In this case, classical state estimation comes down to  Laplace's "rule of succession"). What this means is that if your initial quantum state has N particles in it, then you can reconstruct the initial state (using the particles that are emitted via stimulated emission, but without error correction!) with probability (N+1)/(N+2). This is also the estimated probability that the sun will rise tomorrow, given that you have observed it to rise N times in the past! Go figure.

Then we submit that paper to PRL And then things go from bad to worse. After another endless series of reviews (which admittedly are difficult because how many quantum gravity experts are also experts in quantum information theory and quantum cloning?) we finally give up, after receiving an (unsigned) Division Associate Editor (DAE) report that is, well how can I put it, angry? The report is 11 pages long, and I'm pretty sure who wrote it. But I'm determined to remain a gentleman.

I decide to lay low for a while, in particular because I have other papers to write. For about six years I lay low, give or take. But my interest is renewed when I see a paper by Kamil Bradler on the capacity of the Unruh channel. The Unruh channel, as you can imagine, is kind of like the "Hawking channel", where the noise is not Hawking radiation but rather Unruh radiation. You guessed it. And Bradler flat-out calculates this capacity while acknowledging that we had derived the same exact result for black holes earlier! (You can get his paper from arxiv here). And he also notices that these are all cloning channels!

So I decide to take my four-page article and turn it into a long article for Physical Review D, acknowledging that it is simply too much for PRL. Well, and then I register for the APS March meting to talk about the paper.

My talk at the meeting was, well, eventful: the device that switches one laptop to another to display slides froze my computer. I started to talk while the session chair tried to reboot my computer, log in while I'm talking, then the computer quits and shows black screen. I decide to give the talk entirely without slides, which is just as well: the point I was trying to make can be conveyed with flailing arms only.

After the talk, I am asked whether stimulated emission by any chance sheds light on the AMPS controversy. This discussion, also known as the "firewall controversy", is about another paradox engendered by black holes. Without being too technical, the paradox involves the impossibility of being maximally entangled with two different systems (see John Preskill's description of the paradox).
(Illustration: Courtesy of John Preskill)

I am reminded of the logical inference that if you start out with a statement that is false, you can derive any number of falsehoods from it. In the same manner, if you begin with a paradox (neglecting stimulated emission of radiation) you can generate an infinite number of other paradoxes from it.

I'm perfectly aware that I may be wrong. But let us first agree that:

1.) we should do calculations

Then let the chips fall where they may.

Thursday, March 21, 2013

Oh these rascally black holes! (Part 2)

Now I wrote the word information. For the first time in this blog, actually. People working in the field of quantum gravity use this word a lot, but not always precisely. It has a precise meaning both in classical and in quantum physics. Let me convince you that serious problems may already exist with classical information when paired with black holes, so that I can talk about quantum information in another blog post.

Classical information is the shared entropy between two systems. It has never been anything else than that, and will never be. If you are talking about a set of states and their probability distribution, you are talking about entropy. If you think you have information but you don't know what it predicts, you don't have information, you have entropy. In particular, imagine I have 4 bits of information (which allow me to reduce the entropy of system X, say, by 4 bits). Suppose I encode these 4 bits in a string 4 million bits long and the channel scrambles 23 million, say, of these bits. If the receiver of this string can reconstruct the 4 bits of information (via decoding), no information was lost. She can also reduce the information of X by 4 bits, and thus make exactly the same prediction that I, the sender, was able to make. The stuff that was lost was entropy, not information. The 3,999,994 bits that were used to encode the signal aren't predicting anything. Information is about prediction, after all, nothing else.

But I'll assume you know all this already, and if not you can read a little bit about it in a review I wrote.

Before I go on, there is one last thing you have to know about black holes (those that have no charge, and don't spin on their axis crazily). They can only be distinguished by their mass. That's it. No color, no smell, no weird shape. With this in mind, you may already carry out devilish thought experiments in your head. What if I throw two different things of equal mass into the black hole? After they are swallowed, can you tell me which one I threw in? The answer appears to be no and we would have to seriously think about accusing the black hole of treacherous villainy. But first things first. Let's burn some books first.

OK, burning books is generally frowned upon (particularly by me), but let's keep the analogy for a moment. Let's imagine I have two different books. They weigh exactly the same. But one is, say, Shakespeare's "Hamlet", and the other, oh, Darwin's "Origin". (Two books I like a lot, by the way).

Let's say I throw one or the other into the fire, and they have exactly the same mass, and the cover and pages are exactly the same, except the text (unlike in the pics above). And let's imagine that after they have burned up, the ashes are just ashes. Was information lost? In practice, yes. In principle, no. In terms of classical communication theory, we are dealing with a noisy channel. The receiver cannot access the book, but only watch the flames. And you may think that the flames cannot possibly tell us about the identity of the book I just incinerated. But indeed they could, in principle. When "Hamlet" burns up, the flames and smoke are just a tiny bit different from what happens when "Origin" burns up. It may be imperceptible to your eye because it is lost in the natural variability of fire, but it is there. It must be there, otherwise the laws of physics would be violated. You can imagine an ultra-sensitive measurement device that can distinguish the two, or you can take a page from the book written by Shannon, and make your life a ton easier. You see, it is really quite normal that noise in a channel overwhelms the signal. But if there is any signal at all, then it can be protected from the noise via a process known as "encoding". This process makes the signal state identifiable, and you can imagine doing this by coating the books with some sort of phosphorescent substance before you throw them into the fire: red for "Hamlet", green for "Origins". Now, you just sit back and watch the color of the flame, and then you know which book was just burned.

The thing you have to understand here is that coating the books in this manner is not cheating, because information was never lost in principle, only in practice. We can make things more practical using coding, and this way we will be able to recover information with arbitrary accuracy.

Now let's throw the books into a black hole. You may think: "Oh, the Hawking radiation is just like the fire, we can encode the information in some way and just watch the 'color' of the Hawking radiation". Only this does not work at all. The Hawking radiation is not burning the books. The stuff that is emitted has absolutely nothing to do with what falls in, because for all I know the radiation was just created while the books were thrown in eons ago. There is no causal connection whatsoever between the books and the vacuum fluctuations. In fact Hawking himself acknowledged this right away: the radiation is completely and utterly thermal, which means that it depends on absolutely nothing, except the temperature. And the temperature of the black hole is set precisely by the mass, and the mass of each book is the same. I don't know about you, but I find such a situation absolutely untenable, because if this were all true, we would now have broken the law of "you can reverse anything". When I first read about this, I decided that it could not possibly be right, and embarked on figuring out why.

First, I replace the two books by just two particles, identifiable in some way. You can think of a particle or its anti-particle (of equal mass of course), or of a photon with one or the other polarization. Then I mentally throw them into the black hole. And nothing coming out and the black hole just sitting there almost makes me physically sick, so I realize that just before the particle disappears before the horizon, it must emit something, it simply must. Then I start reading. It's 2003, so I can't Google around. And I quickly happen upon the literature of the quantum theory of radiation, which describes how a black body responds to radiation. And I read a superb article by Einstein from 1917, where he describes how he derived Planck's radiation law using only what now looks like common sense assumptions, but which at the time must have looked like pure magic. In this ground-breaking paper, Einstein shows that when radiation is incident on a black body, three things happen: absorption, reflection, the spontaneous emission of radiation, and the stimulated emission of radiation. Stimulated emission is what give you a laser: a particle comes in, two (identical ones) come out. Put a mirror on one side, and two particles re-enter, and four come out. Put a mirror on the other side...  and you get my drift. (To make a real laser, you have to make one of the mirrors a little permeable, so that the beam can finally get out).

Now, Hawking radiation has precisely Planck's form, but in Hawking's paper you only read about spontaneous emission. What happened to the stimulated part? In fact, I then realize, that emitting stimulated particles is precisely what I need to get rid of that queasy feeling in my stomach! So I read Hawking's paper again and again, and there is no stimulated emission. Zilch, nada.

So then I sit down and redo Hawking's calculation, but I take care not to throw out the bath water when, umm, there's still something in it. The calculation usually goes like this: You write down the vacuum in flat space time (far away from the black hole), and then you transform it into another basis, namely the one in the far future, in the presence of a black hole. This transformation is called a "Bogoliubov transformation" and it creates the future vacuum in which there are particles, from a past vacuum where there are none. Except that if any particles are actually forming the black hole, there should be some particles in the past too! So I just take the past vacuum with a single particle present and evolve it into the future, then I take the vacuum with a single anti-particle, and evolve it into the future. And lo and behold, everything changes! Suddenly, the radiation outside of the black hole at future infinity depends on what I threw in! Of course it has to, because the particle stimulated the emission of another particle before it went down the rabbit hole. Stimulated emission is just like making xerox copies. It's as if physics strips off the information from the particle (which is still falling into the hole) to make sure that the laws of physics are upheld. And I don't feel so terrible.

Then I read some more, and I find that I'm not the only who has noticed this. In fact, Jacob Bekenstein (working with his student Meisels) wrote a beautiful paper just a year after Hawking wrote his, where he essentially writes "Hold your horses Mr. Hawking, you... kinda... forgot something". Using just statistical arguments of the form Einstein used in his <looking for adjectives> really swell 1917 paper <that's a fail>, Bekenstein shows that if you have absorption, reflection, and spontaneous emission of radiation, then you must have stimulated emission. If not, you might get some, umm, paradoxes. Then my student Greg ver Steeg (who is helping me derive all the known results and deriving in parallel with me our new ones) and I discover that Panangaden and Wald have derived Bekenstein's result less than a year later in quantum field theory. But both expressions look very different from the result that we derived. First, we are worried that we have nothing new, then we worry that our calculation, which uses methods completely different from what Bekenstein and Meisels, as well as Pangaden and Wald have used, may be wrong. They look utterly different. The first thing we notice is that Bekenstein's result can be simplified enormously using some of the things we discovered. Then Greg codes both expressions into Mathematica and evaluates them numerically. And they agree excactly!

To make a long story slightly shorter, it took us another year to actually prove that the two expressions (ours and that of Bekenstein & Meisels, which was the same as that of Panangaden & Wald) can be turned into each other analytically, but there it was. Now, all we had to do is prove that including stimulated emission leads to a non-vanishing capacity of the information transmission channel. Because if you can do that, then black holes are exonerated, proven innocent, free to go! Well, perhaps we still have to show that the whole initial state of the black hole can be reconstructed from the final state in principle, but one step at a time! Let's first convince ourselves that the most basic laws aren't fractured, first. So that we can sleep again, and not tiptoe downstairs in the middle of the night to check a calculation that is too hard to do in your head. Really!

Part 3 to appear in due time. Stay tuned!

Wednesday, March 20, 2013

Oh these rascally black holes! (Part I)

People are fascinated by black holes. You can't see them directly, they can be supermassive, and they are mysterious. Kind of like dinosaurs, which explains the attraction black holes have to (some) kids. But among physics people, it seems black holes create more heated arguments than any other topic, as opposed to child-like wonder. Black holes appear to violate some of our most sacred laws, and people cannot agree on whether they are truly violated, whether we should just go on with our merry lives in the light of such larceny, or what the universe is really doing to prevent this deplorable malfeasance.

So what evil thing are black holes accused of? One of the laws they are purportedly breaking is the law that all dynamics must be time-reversible (barring, perhaps, CP-violating processes). One way in which time reversal invariance can be broken is by processes that lead to a coalescence of trajectories (in phase space, to be precise). If trajectories coalesce (two or more turn into one) then I cannot run time backward unambiguously ("Which branch should I take?") The coalescence of phase space trajectories implies that knowing the future does not allow us to predict the past. It is truly an abomination, and we have to insist that black holes stop it (if in fact they are guilty).

Another law that we believe in is that wave functions evolve forward in time in a unitary manner, which implies that the entropy of a known state is and remains zero for all time. The latter implies that the (quantum) state is and remains predictable at all times. There is a direct relationship with our law of time-reversal invariance as you see immediately, because if all quantum trajectories can be reversed uniquely, then this means they never coalesce. Two trajectories that have coalesced cannot be time-reversed unambiguously: hence the relationship between predictability and time reversal.  Such a coalescence of trajectories has many outrageous consequences: for example the vanishing entropy of the initial state, upon evaporation of the black hole (something I will explain below), would turn into the non-vanishing entropy of the radiation field left behind. Unitarity would be lost, and with that our conviction that the universe is and remains pure. It is like the loss of innocence.

I will try to convince you here that it is the accused that is innocent, that black holes are just ordinary participants within cosmology. They are quantum, and they are heavy, they are black bodies, but they are not evil and they certainly do not violate any laws.

First, what is the evidence for this violation? This evidence goes back to a 1975 paper by Stephen Hawking, which introduced the world to his eponymous radiation. The paper is not an easy read, but I still encourage everyone who wants to enter the field to read it, and to replicate the calculation as much as he or she can. In my view, nothing replaces actually doing a calculation and re-deriving results. However, there are now much more succinct ways to derive the same result (I think I can do it on a single page), and I'll sketch those here (without equations, though). My simplification relies on ignoring the red shift (the lengthening of wavelengths when light moves within a gravitational field). This may appear problematic, but we can restore the red shift at the end of the calculation, and consider its effect separately. The red shift does not change any of the arguments I give here. It's something practitioners do frequently, if only the informational aspect is of concern.

The central result of Hawking comes from understanding what a vacuum is. In ordinary language, a vacuum is the absence of anything, but not in quantum field theory. In quantum field theory, the vacuum teems with fluctuations: particles and their respective anti-particles are constantly created in pairs, only in order to decay again (lest they violate our most sacred of laws: energy conservation). In fact, any time a pair is produced, it must borrow a little bit of energy (from the infinite bank of the universe) which may allow them to travel apart from each other for a little bit. But of course, this attempt at separation between the twin particles must be fleeting, because energy bills must be paid.  The pair annihilates in a flash, returning the borrowed good to the bank. Now suppose a system is accelerated, like, a lot. The pairs are still being produced. Now imagine that one pair borrows a lot of energy, and manages to move apart appreciably. Because the system is so strongly accelerated, it can happen that the pair can never be re-united (unless one travels faster than light). One of the particles has disappeared behind a "causal wall", and if you are part of the accelerated system, you will see only one of the two particles, which is now all alone. Now, this looks like radiation. There are now physical particles in a system where there were none when the system was at rest. This curious fact was discovered independently by Stephen Fulling, Paul Davies, and William Unruh, but the effect is usually just abbreviated as the "Unruh effect". If you think about it, it Unruh radiation makes a lot of sense.

Now let's imagine that the pairs are formed (and de-formed) not in an accelerated system, but instead near the horizon of a large black hole. If you paid attention in whatever class taught you general relativity (or whatever book or blog you read to replace said class), you know that Einstein's path to understanding gravity was precisely from seeing the analogy between accelerated systems and gravitational fields. At the edge of the horizon, the same thing can happen to the excited twin-pair as what happened to the accelerated twin pair: one may venture towards the horizon, and one may move the opposite way. But if the daring one goes beyond the point of no return, there will be no happy reunion: the twin moving away from the horizon looks like a particle: he is Hawking radiation. So: Hawking radiation is just like Unruh radiation, only near black hole horizons. Fine, but so what?

Credit: Science Magazine (2004)

Well, there is more. I told you somebody had to pay the energy bill. In this case it will be the black hole who has to pay: there is nobody else around. (In the case of the accelerated observer, it is this observer/detector who will lose mass.)  If this process happens often enough, the black hole will lose all of its mass: it is said to have evaporated. So what? So big deal, as I demonstrate now.

The stuff  that made the black hole isn't just stuff: it's particles and radiation. So yes, particles and radiation turn into particles and radiation, but the stuff that made the star can be seen as special: quantum mechanically, we can say that it is completely known. But after evaporation, nothing of that knowledge remains. After all, according to this picture, all there is to the black hole is mass: the details of how this mass was formed are completely gone. Trajectories have merged in a most heinous way. Information is lost. Or is it?

To Be Continued [Parts 2 and 3 will appear in due time. Stay tuned!]