CHAPTER 1. INTRODUCTION
cars, trucks, and birds, and these objects can each be red, green, or blue. One way
of representing these inputs would be to have a separate neuron or hidden unit
that activates for each of the nine possible combinations: red truck, red car, red
bird, green truck, and so on. This requires nine different neurons, and each neuron
must independently learn the concept of color and object identity. One way to
improve on this situation is to use a distributed representation, with three neurons
describing the color and three neurons describing the object identity. This requires
only six neurons total instead of nine, and the neuron describing redness is able to
learn about redness from images of cars, trucks and birds, not just from images
of one specific category of objects. The concept of distributed representation is
central to this book and is described in greater detail in chapter 15.
Another major accomplishment of the connectionist movement was the suc-
cessful use of back-propagation to train deep neural networks with internal repre-
sentations and the popularization of the back-propagation algorithm (Rumelhart
et al., 1986a; LeCun, 1987). This algorithm has waxed and waned in popularity
but, as of this writing, is the dominant approach to training deep models.
During the 1990s, researchers made important advances in modeling sequences
with neural networks. Hochreiter (1991) and Bengio et al. (1994) identified some of
the fundamental mathematical difficulties in modeling long sequences, described in
section 10.7. Hochreiter and Schmidhuber (1997) introduced the long short-term
memory (LSTM) network to resolve some of these difficulties. Today, the LSTM is
widely used for many sequence modeling tasks, including many natural language
processing tasks at Google.
The second wave of neural networks research lasted until the mid-1990s. Ven-
tures based on neural networks and other AI technologies began to make unrealisti-
cally ambitious claims while seeking investments. When AI research did not fulfill
these unreasonable expectations, investors were disappointed. Simultaneously,
other fields of machine learning made advances. Kernel machines (Boser et al.,
1992; Cortes and Vapnik, 1995; Schölkopf et al., 1999) and graphical models (Jor-
dan, 1998) both achieved good results on many important tasks. These two factors
led to a decline in the popularity of neural networks that lasted until 2007.
During this time, neural networks continued to obtain impressive performance
on some tasks (LeCun et al., 1998b; Bengio et al., 2001). The Canadian Institute
for Advanced Research (CIFAR) helped to keep neural networks research alive
via its Neural Computation and Adaptive Perception (NCAP) research initiative.
This program united machine learning research groups led by Geoffrey Hinton at
University of Toronto, Yoshua Bengio at University of Montreal, and Yann LeCun
at New York University. The multidisciplinary CIFAR NCAP research initiative
17