
CHAPTER 3. PROBABILITY AND INFORMATION THEORY
then the discretization makes the robot immediately become uncertain about
the precise position of objects: each object could be anywhere within the
discrete cell that it was observed to occupy.
In many cases, it is more practical to use a simple but uncertain rule rather
than a complex but certain one, even if the true rule is deterministic and our
modeling system has the fidelity to accommodate a complex rule. For example, the
simple rule “Most birds fly” is cheap to develop and is broadly useful, while a rule
of the form, “Birds fly, except for very young birds that have not yet learned to
fly, sick or injured birds that have lost the ability to fly, flightless species of birds
including the cassowary, ostrich and kiwi. . .” is expensive to develop, maintain
and communicate and, after all this effort, is still brittle and prone to failure.
While it should be clear that we need a means of representing and reasoning
about uncertainty, it is not immediately obvious that probability theory can provide
all the tools we want for artificial intelligence applications. Probability theory
was originally developed to analyze the frequencies of events. It is easy to see
how probability theory can be used to study events like drawing a certain hand of
cards in a poker game. These kinds of events are often repeatable. When we say
that an outcome has a probability
p
of occurring, it means that if we repeated the
experiment (e.g., drawing a hand of cards) infinitely many times, then a proportion
p
of the repetitions would result in that outcome. This kind of reasoning does not
seem immediately applicable to propositions that are not repeatable. If a doctor
analyzes a patient and says that the patient has a 40 percent chance of having
the flu, this means something very different—we cannot make infinitely many
replicas of the patient, nor is there any reason to believe that different replicas of
the patient would present with the same symptoms yet have varying underlying
conditions. In the case of the doctor diagnosing the patient, we use probability
to represent a
degree of belief
, with 1 indicating absolute certainty that the
patient has the flu and 0 indicating absolute certainty that the patient does not
have the flu. The former kind of probability, related directly to the rates at which
events occur, is known as
frequentist probability
, while the latter, related to
qualitative levels of certainty, is known as Bayesian probability.
If we list several properties that we expect common sense reasoning about
uncertainty to have, then the only way to satisfy those properties is to treat
Bayesian probabilities as behaving exactly the same as frequentist probabilities.
For example, if we want to compute the probability that a player will win a poker
game given that she has a certain set of cards, we use exactly the same formulas
as when we compute the probability that a patient has a disease given that she
has certain symptoms. For more details about why a small set of common sense
53