CHAPTER 5. MACHINE LEARNING BASICS
bility distribution over images, text strings, and sounds that occur in real life is
highly concentrated. Uniform noise essentially never resembles structured inputs
from these domains. Figure 5.12 shows how, instead, uniformly sampled points
look like the patterns of static that appear on analog television sets when no signal
is available. Similarly, if you generate a document by picking letters uniformly at
random, what is the probability that you will get a meaningful English-language
text? Almost zero, again, because most of the long sequences of letters do not
correspond to a natural language sequence: the distribution of natural language
sequences occupies a very little volume in the total space of sequences of letters.
Of course, concentrated probability distributions are not sufficient to show that
the data lies on a reasonably small number of manifolds. We must also establish
that the examples we encounter are connected to each other by other examples,
with each example surrounded by other highly similar examples that can be reached
by applying transformations to traverse the manifold. The second argument in
favor of the manifold hypothesis is that we can imagine such neighborhoods and
transformations, at least informally. In the case of images, we can certainly think
of many possible transformations that allow us to trace out a manifold in image
space: we can gradually dim or brighten the lights, gradually move or rotate
objects in the image, gradually alter the colors on the surfaces of objects, and so
forth. Multiple manifolds are likely involved in most applications. For example,
the manifold of human face images may not be connected to the manifold of cat
face images.
These thought experiments convey some intuitive reasons supporting the mani-
fold hypothesis. More rigorous experiments (Cayton, 2005; Narayanan and Mitter,
2010; Schölkopf et al., 1998; Roweis and Saul, 2000; Tenenbaum et al., 2000; Brand,
2003; Belkin and Niyogi, 2003; Donoho and Grimes, 2003; Weinberger and Saul,
2004) clearly support the hypothesis for a large class of datasets of interest in AI.
When the data lies on a low-dimensional manifold, it can be most natural for
machine learning algorithms to represent the data in terms of coordinates on the
manifold, rather than in terms of coordinates in
R
n
. In everyday life, we can think
of roads as 1-D manifolds embedded in 3-D space. We give directions to specific
addresses in terms of address numbers along these 1-D roads, not in terms of
coordinates in 3-D space. Extracting these manifold coordinates is challenging but
holds the promise of improving many machine learning algorithms. This general
principle is applied in many contexts. Figure 5.13 shows the manifold structure of
a dataset consisting of faces. By the end of this book, we will have developed the
methods necessary to learn such a manifold structure. In figure 20.6, we will see
how a machine learning algorithm can successfully accomplish this goal.
160