Information theory
Information theory is a branch of mathematics dealing with the representation, storage, and transmission of information. It has found important applications in biology, electrical engineering, linguistics, and computer science.
This learning resource is intended to give an intuitive introduction to the ideas of information theory. We encourage you to follow the links above and below to other online resources to broaden the understanding you may begin to build here.
Information
Have you ever played the parlor game called Twenty Questions?
It's both easy and fun: one person (Answerer) thinks of some specific object (for example, the Earth),
and the other (Questioner) asks questions to figure out which object Answerer has chosen. Questioner
can ask any question about the object, so long as it can be answered simply "yes" or "no". One question
to ask first might be "Is it alive?" Since the Earth itself isn't alive, Answerer says "no" (Answerer must
tell the truth). This answered question gives Questioner information about what the object is
and helps Questioner decide which question to ask next. If Questioner guesses the object by the
question, Questioner wins; otherwise Answerer wins.
Information theory is simply a way to think about quantifying, representing,
and communicating information of the sort the answered question above provides. To be more precise,
information theory
is relevant to any setting in which there is a set of events along
with one or more ideas of how likely each event is to occur (or to have occurred). A particular idea of how
likely each event is to occur is a probability function
, where
is the probability that event
will occur. In Twenty Questions,
the events refer to Answerer's choice of an object: "Answerer chose a tiger", "Answerer chose the Earth", or
"Answerer chose his left thumb" are all events that might have occurred at the start of the game. In general
we can think of event
as referring to "Answerer chose the
object from
a master list of all objects that exist".
At the start of a game of Twenty Questions, Questioner doesn't know anything about the
object Answerer has chosen. That is, as far as Questioner knows any object is equally likely to be the one
Answerer has thought of: a bacterium, Answerer's left thumb, the entire universe. In mathematical terms,
for any two events
and
. After the first question is answered, Answerer forms a new probability
function from the new information. In information theory we assume that Questioner understands the full importance
of the information the answered question provided, because the information is there to be used whether Questioner
is a perfect player or not. Most of the questions asked in Twenty Questions will rule out some set
of objects entirely, however in other settings new information will change the probability function in other
ways, making some events seem more likely or others seem less likely. It is the difference between the best
probability function that could have been formed before a question was asked, taking full advantage of everything known
to that point, and the best probability function available after the question was answered which illustrates
how much information the answered question provided.
Quantities of information
Quantities of information are relevant to a lot of different questions, such as:
- How many yes/no questions does it take to guess a number between 1 and 3 billion?
- How many songs can my MP3 player hold?
- NASA needs to send a message to a spacecraft orbiting Mars, but up to 10% of the message may get garbled along the way. How can NASA ensure the spacecraft gets the full message?
- Would a book written in German be shorter than the same book written in Spanish?
- Which parts of the human genome are genes?
The standard unit of information is the bit: one bit is the amount of information provided by choosing one of two equally likely possibilities. One toss of a fair coin provides one bit of information: before tossing the coin either heads or tails is equally likely to be the result, but afterwards we know whether the coin landed heads up or tails up.
We can now answer the first question above: how many yes/no questions does it take to guess a number between and
billion? The very best we can do is to ask if the number is in the upper half or lower half of the range, until the range is just a single number. Doing so eliminates half the possibilities on each question, no matter whether the answer is "upper half" or "lower half". The number of questions is therefore the number of times one can divide the total range in half before getting to a range of just one number; i.e. what is the smallest
such that
is at least
billion? If you were served exactly one of the first
billion hamburgers served by McDonald's, knowing exactly which one you were served gives you about
bits of information, because
must be halved
times to reach a range of only
number. In general, guessing a number between
and
requires
bits of information.
Representing information
TBD
Communicating information
TBD
Entropy and Information
Let be a finite set of possible events (for example, possible event
corresponds to Answerer chose the
object from a known list of all objects Answerer is allowed to choose in a game of Twenty Questions). We can represent the full information we have about which of these possibilities is the truth (which object Answerer chose) as a discrete probability distribution
As a step toward determining the amount of information we have, we define the uncertainty or entropy of
as
Note that if is one of
equally likely possibilities, the amount of information needed to determine that
is the event that occurred is
bits. Thus we can think of
as the amount of information required to identify that
occurred, and by weighting each
by its probability
the
entropy of a probability distribution can be thought of as the average information required to identify which event occurred, under the assumption that the events occur according to the probability distribution in question (i.e. according to the information the probability distribution represents). Entropy is therefore a measure of how much doubt remains in determining which event occurred, after having taken into account the information represented by
. It is therefore appropriate to consider entropy as defined above a measure of uncertainty.
Receiving the answer to a (helpful) question reduces the amount of uncertainty we have about which event occurred. Information therefore corresponds to a reduction in uncertainty. In fact, this is how information is defined in mathematical terms. Formally, if is the probability distribution over a finite set of possibilities before a question is asked and
is the probability distribution over the same set of possibilities after the question is asked, then the information provided by the answered question is
Information
The information provided by a question that changes the probability of each object from to
is therefore defined as the reduction in entropy from
to
.
Textbooks
Thomas M. Cover and Joy A. Thomas, Elements of Information Theory, 2nd Ed., Wiley-Interscience, 2006. A modern textbook.
Robert B. Ash, Information Theory, Dover Publications, 1990.
John R. Pierce, An Introduction to Information Theory, Dover Publications, 1980. A very readable introduction.
Claude E. Shannon and Warren Weaver, The Mathematical Theory of Communication, University of Illinois Press, 1963.
On-line tutorials
David P. Hornacek, A Brief Tutorial on Information Theory, Excess Entropy and Statistical Complexity: Discovering and Quantifying Statistical Structure, Lecture notes downloadable in several formats, 1997.