In the first part of this post I have talked to you about entropy mostly. How the entropy of a physical system (such as a die, a coin, or a book) depends on the measurement device that you will use for querying that system. That, come to think of it, the uncertainty (or entropy) of any physical object really is infinite, and made finite only by the finiteness of our measurement devices. If you start to think about it, of course the things you could possibly know about any physical object is infinite! Think about it! Look at any object near to you. OK, the screen in front of you. Just imagine a microscope zooming in on the area framing the screen, revealing the intricate details of the material. The variations that the manufacturing process left behind, making each and every computer screen (or iPad or iPhone), essentially unique.
If this was another blog, I would now launch into a discussion of how there is a precise parallel (really!) to renormalization theory in quantum field theory... but it isn't. So, let's instead delve head first into the matter, and finally discuss the concept of information.
If this was another blog, I would now launch into a discussion of how there is a precise parallel (really!) to renormalization theory in quantum field theory... but it isn't. So, let's instead delve head first into the matter, and finally discuss the concept of information.
What does it even mean to have information? Yes, of course, it means that you know something. About something. Let's make this more precise. I'll conjure up the old "urn". The urn has things in it. You have to tell me what they are.
So, now imagine that.....
Credit: www.dystopiafunction.com
So, now imagine that.....
"Hold on, hold on. Who told you that the urn has things in it? Isn't that information already? Who told you that?"
OK, fine, good point. But you know, the urn is really just a stand-in for what we call "random variables" in probability theory. A random variable is a "thing" that can take on different states. Kind of like the urn, that you draw something from? When I draw a blue ball, say, then the "state of the urn" is blue. If I draw a red ball, then the "state of the urn" is red. So, "urn=random variable". OK?
"OK, fine, but you haven't answered my question. Who told you that there are blue and red balls in it? Who?"
You really are interrupting my explanations here. Who are you anyway? Never mind. Let me think about this. Here's the thing. When a mathematician defines a random variable, they tell you which state it can take on, and with what probability. Like: "A fair coin is a random variable with two states. Each state can be taken on with equal probability one-half." When they give you an urn, they also tell you how likely it is to get a blue or a red balls from it. They just don't tell you what you will actually get when you pull one out.
"But is this how real systems are? That you know the alternatives before asking questions?"
All right, all right. I'm trying to teach you information theory, the way it is taught in any school you would set your foot in. I concede, when I define a random variable, then I tell you how many states it can take on, and what the probability is that you will see each of these states, when you "reach into the random variable". Let's say that this info is magically conferred upon you. Happy now?
"Not really."
OK, let's just imagine that you spend a long time with this urn, and after a while of messing with it, you do realize that:
A) This urn has balls in it.
B) From what you can tell, they are blue and red.
C) Reds occur more frequently than blues, but you're still working on what the ratio is.
Is this enough?
"At least now we're talking. Do you know that you assume a lot when you say "random variable"?
I wanted to tell you about information, and we got bogged down in this discussion about random variables instead. Really, you're getting in the way of some valuable instruction here. Could you just go away?
"You want to tell me what it means to 'know something', and you use urns, which you say are just code for random variables, and I find out that there is all this hidden information in there! Who is getting in the way of instruction here??? Just sayin'!"
....
OK.
....
All right, you're making this more difficult than I intended it to be. According to standard lore, it appears that you're allowed to assume that you know something about the things you know nothing about. Let's just call these things "common sense". And the things you don't know about the random variable are the things that go beyond common sense. The things that, unless you had performed dedicated experiments to ascertain the state of the variables, you kinda know. Like, that a coin has two sides. That's common knowledge, right?
"And urns have red and blue balls in it? What about red and green?"
You're kinda pushing it now. Shut up.
Soooo. Here we are. Excuse this outburst. Moving on.
We have this urn. It's got red and blue balls in it. (This is common knowledge.) They could be any pairs of colors, you do realize. How much don't you know about it?
We have this urn. It's got red and blue balls in it. (This is common knowledge.) They could be any pairs of colors, you do realize. How much don't you know about it?
Easily answered using our good buddy Shannon's insight. How much you don't know is quantified by the "entropy" of the urn. That's calculated from the fraction of blue balls known to be in the urn, and the fraction of red balls in the urn. You know, these fractions that are common knowledge. So, let's say that fraction of blue is p. The fraction of red then is of course (you do the math) 1-p. And the entropy of the urn is
\(H(X)=-p\log p-(1-p)\log(1-p)\) (1)
Now you're gonna ask me about the logarithm aren't you? Like, what base are you using?
You should. The mathematical logarithm function needs a basis. Without it, its value is undefined. But given the base, the entropy function defined above gets more than just a value: it gets units. So, for example, if the base is 2, then the units are "bits". If the base is e, then the units are "nats". We are mostly going to be using bits, so base 2 it is.
"In part 1 you wrote that the entropy is $\log N$, where $N$ is the number of states of the system. Are you changing definitions on me?"
I'm not, actually. I just used a special case of the entropy to get across the point that the uncertainty/entropy is additive. It was the special case where each possible state occurs equally likely. In that case, the probability $p$ is equal to $1/N$, and the above formula (1) turns into the first one.
But let's get back to our urn. I mean random variable. And let's try to answer the question:
"How much is there to know (about it)? "
Assuming that we know the common knowledge stuff that the urn only has read and blue balls in it, then what we don't know is the identity of the next ball that we will draw. This drawing of balls is our experiment. We would love to be able to predict the outcome of this experiment exactly, but in order to pull off this feat, we would have to have some information about the urn. I mean, the contents of the urn.
If we know nothing else about this urn, then the uncertainty is equal to the log of the number of possible states, as I wrote before. Because there are only red and blue balls, that would be log 2. And if the base of the log is two, then the result is $\log_2 2=1$ bit. So, if there are red and blue balls only in an urn, then I can predict the outcome of an experiment (pulling a ball from the urn) just as well as I can predict whether a fair coin lands on heads or tails. If I correctly predict the outcome (I will be able to do this about half the time, on average) I am correct purely by chance. Information is that which allows you to make a correct prediction with accuracy better than chance, which in this case means, more than half of the time.
"How can you do this, for the case of the fair coin, or the urn with equal numbers of red and blue balls?"
Well, you can't unless you cheat. I should say, the case of the urn and of the fair coin are somewhat different. For the fair coin, I could use the knowledge of the state of the coin before flipping, and the forces acting on it during the flip, to calculate how it is going to land, at least approximately. This is a sophisticated way to use extra information to make predictions (the information here is the initial condition of the coin) but something akin to that has been used by a bunch of physics grad students to predict the outcome of casino roulette in the late 70s. (And incidentally I know a bunch of them!)
The coin is different from the urn because for the urn, you won't be able to get any "extraneous" information. But suppose the urn has blue and red balls in unequal proportions. If you knew what these proportions were [the \(p\) and \(1-p\) in Eq. (1) above] then you could reduce the uncertainty of 1 bit to \(H(X)\). A priori (that is, before performing any measurements on the probability distribution of blue and read balls), the distribution is of course given by \(p=1/2\), which is what you have to assume in the absence of information. That means your uncertainty is 1 bit. But keep in mind (from part 1: The Eye of the Beholder) that it is only one bit because you have decided that the color of the ball (blue or red) is what you are interested in predicting.
If you start drawing balls from the urn (and then replacing them, and noting down the result, of course) you would be able to estimate \(p\) from the frequencies of blue and red balls. So, for example, if you end up seeing 9 times as many red balls as blue balls, you should adjust your prediction strategy to "The next one will be red". And you would likely be right about 90% of the time, quite a bit better than the 50/50 prior.
"So what you are telling me, is that the entropy formula (1) assumes a whole lot of things, such as that you already know to expect a bunch of things, namely what the possible alternatives of the measurement are, and even what the frequency distribution is, which you can really only know if you have divine inspiration, or else made a ton of measurements!"
Yes, dear reader, that's what I'm telling you. You already come equipped with some information (your common sense) and if you can predict with accuracy better than chance (because somebody told you the \(p\) and it is not one half), then you have some more info. And yes, most people won't tell you that. But if you want to know about information, you first need to know.... what it is that you already know.
Part 3: Everything is Conditional