
CHAPTER 14. AUTOENCODERS
nately, if the encoder and decoder are allowed too much capacity, the autoencoder
can learn to perform the copying task without extracting useful information about
the distribution of the data. Theoretically, one could imagine that an autoencoder
with a one-dimensional code but a very powerful nonlinear encoder could learn to
represent each training example
x
(i)
with the code
i
. The decoder could learn to
map these integer indices back to the values of specific training examples. This
specific scenario does not occur in practice, but it illustrates clearly that an autoen-
coder trained to perform the copying task can fail to learn anything useful about
the dataset if the capacity of the autoencoder is allowed to become too great.
14.2 Regularized Autoencoders
Undercomplete autoencoders, with code dimension less than the input dimension,
can learn the most salient features of the data distribution. We have seen that
these autoencoders fail to learn anything useful if the encoder and decoder are
given too much capacity.
A similar problem occurs if the hidden code is allowed to have dimension
equal to the input, and in the
overcomplete
case in which the hidden code has
dimension greater than the input. In these cases, even a linear encoder and a linear
decoder can learn to copy the input to the output without learning anything useful
about the data distribution.
Ideally, one could train any architecture of autoencoder successfully, choosing
the code dimension and the capacity of the encoder and decoder based on the
complexity of distribution to be modeled. Regularized autoencoders provide the
ability to do so. Rather than limiting the model capacity by keeping the encoder
and decoder shallow and the code size small, regularized autoencoders use a loss
function that encourages the model to have other properties besides the ability
to copy its input to its output. These other properties include sparsity of the
representation, smallness of the derivative of the representation, and robustness
to noise or to missing inputs. A regularized autoencoder can be nonlinear and
overcomplete but still learn something useful about the data distribution, even if
the model capacity is great enough to learn a trivial identity function.
In addition to the methods described here, which are most naturally interpreted
as regularized autoencoders, nearly any generative model with latent variables
and equipped with an inference procedure (for computing latent representations
given input) may be viewed as a particular form of autoencoder. Two generative
modeling approaches that emphasize this connection with autoencoders are the
descendants of the Helmholtz machine (Hinton et al., 1995b), such as the variational
501