VAE

Objective:

$$ \max_\theta \sum_{i=1}^N \log p_\theta(x^{(i)}) $$

其中

$$ \log p_\theta(x) = \log \int p_\theta(x|z) p(z) \operatorname dz $$

$z$ 服从于先验分布

$$ z \sim \mathcal N (0, \sigma^2I) $$

由于这个概率是不可计算的(intractable),因此使用变分推断(variational inference)

$$ \log p_\theta(x) = \log \int p_\theta(x|z)p(z) \operatorname dz \\ = \log \int q_\phi(z|x) \frac{p_\theta(x|z)p(z)}{q_\phi(z|x)} \operatorname dz \\ = \log \mathbb E_{z \sim q_\phi(z|x)}[\frac{p_\theta(x|z)p(z)}{q_\phi(z|x)}] \\ \ge \mathbb E_{z \sim q_\phi(z|x)}[\log \frac{p_\theta(x|z)p(z)}{q_\phi(z|x)}] \quad \text{(Jensen不等式)} \\ = \mathbb E_{z \sim q_\phi(z|x)}[\log p_\theta(x|z) + \log p(z) - \log q_\phi (z|x)] \\ = \mathbb E_{z \sim q_\phi(z|x)}[\log p_\theta(x|z)] - D_{KL}(q_\phi(z|x) \| p(z)) $$

回到最初的objective:

$$ \max_\theta \sum_{i=1}^N \log p_\theta(x^{(i)}) \ge \max_\theta \sum_{i=1}^N \left( \mathbb E_{z \sim q_\phi(z|x^{(i)})}[\log p_\theta(x^{(i)}|z)] - D_{KL}(q_\phi(z|x^{(i)}) \| p(z))\right) $$

其中 $\phi$ 的梯度要通过重参数化(reparameterization)来反向传播;目标函数的第一项为数据重构项,第二项为KL散度项。

Diffusion

DDPM Basics

多条件贝叶斯公式

$$ P(A,B|C) = P(B|C) P(A|B,C)=P(A|C)P(B|A,C) $$

因此

$$ P(A|B,C) = \frac{P(A|C) P(B|A,C)}{P(B|C)} $$

主要推导

将单步的加噪/去噪写成多步的形式: