Objective:
$$ \max_\theta \sum_{i=1}^N \log p_\theta(x^{(i)}) $$
其中
$$ \log p_\theta(x) = \log \int p_\theta(x|z) p(z) \operatorname dz $$
$z$ 服从于先验分布
$$ z \sim \mathcal N (0, \sigma^2I) $$
由于这个概率是不可计算的(intractable),因此使用变分推断(variational inference)
$$ \log p_\theta(x) = \log \int p_\theta(x|z)p(z) \operatorname dz \\ = \log \int q_\phi(z|x) \frac{p_\theta(x|z)p(z)}{q_\phi(z|x)} \operatorname dz \\ = \log \mathbb E_{z \sim q_\phi(z|x)}[\frac{p_\theta(x|z)p(z)}{q_\phi(z|x)}] \\ \ge \mathbb E_{z \sim q_\phi(z|x)}[\log \frac{p_\theta(x|z)p(z)}{q_\phi(z|x)}] \quad \text{(Jensen不等式)} \\ = \mathbb E_{z \sim q_\phi(z|x)}[\log p_\theta(x|z) + \log p(z) - \log q_\phi (z|x)] \\ = \mathbb E_{z \sim q_\phi(z|x)}[\log p_\theta(x|z)] - D_{KL}(q_\phi(z|x) \| p(z)) $$
回到最初的objective:
$$ \max_\theta \sum_{i=1}^N \log p_\theta(x^{(i)}) \ge \max_\theta \sum_{i=1}^N \left( \mathbb E_{z \sim q_\phi(z|x^{(i)})}[\log p_\theta(x^{(i)}|z)] - D_{KL}(q_\phi(z|x^{(i)}) \| p(z))\right) $$
其中 $\phi$ 的梯度要通过重参数化(reparameterization)来反向传播;目标函数的第一项为数据重构项,第二项为KL散度项。
多条件贝叶斯公式
$$ P(A,B|C) = P(B|C) P(A|B,C)=P(A|C)P(B|A,C) $$
因此
$$ P(A|B,C) = \frac{P(A|C) P(B|A,C)}{P(B|C)} $$
将单步的加噪/去噪写成多步的形式: