Yang Song의 논문. 여러 Diffusion models의 논문들에 언급이 되기 때문에, 이의 원리를 아는 것은 중요할 것이라 생각하여 이 논문을 골랐습니다. 다른 이론적인 논문들을 읽어도 이 논문은 항상 언급되어 이해하는 것이 필요하다고 생각합니다.

Yang Song은 이후에도 CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation, Score-Based Generative Classifiers, Score-Based Generative Modeling through Stochastic Differential Equations, Maximum Likelihood Training of Score-Based Diffusion Models, How to Train Your Energy-Based Models, Learning Energy-Based Models by Diffusion Recovery Likelihood, Training deep energy-based models with f-divergence minimization 등등 매우 많은 논문을 Diffusion Model (+ Score based)과 관련지어 냈습니다.

Langevin dynamics using gradients of the data estimated with score matching을 진행함.

Introduction

Generative models : GAN (use f-divergence or integral prob metrics (WGAN), phi-divergence) vs likelihood based methods (NICE (flow), VAE, PixelRNN)

“New Principle for generative modeling based on estimating and sampling from Stein score``

Nerual network trained with Score matching (to learn vector field from data) → produce samples using Langevin dynamics (works by gradually moving a random initial sample to high density regions along the (estimated) vector field of scores)

Challenges with this approach

if the data distribution is supported on a low dimensional manifold—as it is often assumed for many real world datasets—the score will be undefined in the ambient space, and score matching will fail to provide a consistent score estimator
the scarcity of training data in low data density regions, e.g., far from the manifold, hinders the accuracy of score estimation and slows down the mixing of Langevin dynamics sampling.

How can overcome?

perturb the data with random Gaussian noise of various magnitudes

Desirable Properties

tractable for almost all parameterizations of the score networks without the need of special constraints or architectures, and can be optimized without adversarial training, MCMC sampling, or other approximations during training.