CHAPTER 20. DEEP GENERATIVE MODELS
y is a waveform, and the entire waveform must sound like a coherent utterance.
A natural way to represent the relationships between the entries in
y
is to
use a probability distribution
p
(
y | x
). Boltzmann machines, extended to model
conditional distributions, can supply this probabilistic model.
The same tool of conditional modeling with a Boltzmann machine can be used
not just for structured output tasks, but also for sequence modeling. In the latter
case, rather than mapping an input
x
to an output
y
, the model must estimate a
probability distribution over a sequence of variables,
p
(
x
(1)
, . . . , x
(τ)
). Conditional
Boltzmann machines can represent factors of the form
p
(
x
(t)
| x
(1)
, . . . , x
(t−1)
) in
order to accomplish this task.
An important sequence modeling task for the video game and film industry
is modeling sequences of joint angles of skeletons used to render 3-D characters.
These sequences are often collected using motion capture systems to record the
movements of actors. A probabilistic model of a character’s movement allows
the generation of new, previously unseen, but realistic animations. To solve
this sequence modeling task, Taylor et al. (2007) introduced a conditional RBM
modeling
p
(
x
(t)
| x
(t−1)
, . . . , x
(t−m)
) for small
m
. The model is an RBM over
p
(
x
(t)
) whose bias parameters are a linear function of the preceding
m
values of
x
.
When we condition on different values of
x
(t−1)
and earlier variables, we get a new
RBM over
x
. The weights in the RBM over
x
never change, but by conditioning on
different past values, we can change the probability of different hidden units in the
RBM being active. By activating and deactivating different subsets of hidden units,
we can make large changes to the probability distribution induced on
x
. Other
variants of conditional RBM (Mnih et al., 2011) and other variants of sequence
modeling using conditional RBMs are possible (Taylor and Hinton, 2009; Sutskever
et al., 2009; Boulanger-Lewandowski et al., 2012).
Another sequence modeling task is to model the distribution over sequences
of musical notes used to compose songs. Boulanger-Lewandowski et al. (2012)
introduced the
RNN-RBM
sequence model and applied it to this task. The
RNN-RBM is a generative model of a sequence of frames
x
(t)
consisting of an RNN
that emits the RBM parameters for each time step. Unlike previous approaches in
which only the bias parameters of the RBM varied from one time step to the next,
the RNN-RBM uses the RNN to emit all the parameters of the RBM, including
the weights. To train the model, we need to be able to back-propagate the gradient
of the loss function through the RNN. The loss function is not applied directly to
the RNN outputs. Instead, it is applied to the RBM. This means that we must
approximately differentiate the loss with respect to the RBM parameters using
contrastive divergence or a related algorithm. This approximate gradient may then
682