data:image/s3,"s3://crabby-images/e516b/e516bf1f3f058d0fb7d7aa751e1c39f01c41634f" alt=""
CONTENTS
Functions
f : A → B The function f with domain A and range B
f ◦ g Composition of the functions f and g
f(x; θ)
A function of
x
parametrized by
θ
. (Sometimes
we write
f
(
x
) and omit the argument
θ
to lighten
notation)
log x Natural logarithm of x
σ(x) Logistic sigmoid,
1
1 + exp(−x)
ζ(x) Softplus, log(1 + exp(x))
||x||
p
L
p
norm of x
||x|| L
2
norm of x
x
+
Positive part of x, i.e., max(0, x)
1
condition
is 1 if the condition is true, 0 otherwise
Sometimes we use a function
f
whose argument is a scalar but apply it to a
vector, matrix, or tensor:
f
(
x
),
f
(
X
), or
f
(
X
). This denotes the application of
f
to the array element-wise. For example, if
C
=
σ
(
X
), then C
i,j,k
=
σ
(X
i,j,k
) for all
valid values of i, j and k.
Datasets and Distributions
p
data
The data generating distribution
ˆp
data
The empirical distribution defined by the training
set
X A set of training examples
x
(i)
The i-th example (input) from a dataset
y
(i)
or y
(i)
The target associated with
x
(i)
for supervised learn-
ing
X
The
m × n
matrix with input example
x
(i)
in row
X
i,:
xvi