DSM in the view of Tweedie’s formula

Untitled

Denoising score matching objective는 위와 같다. 여기서 first order derivative인 score를 $s_1$과 같이 표기하였다. $\sigma$가 작을 때 이 objective를 minimize하는 것은 $\mathbf {\tilde s}_1(\tilde x; \theta)$와 true score $\mathbf {\tilde s}_1(\tilde x)$의 거리를 minimize 하는 것과 같다. 이것을 유도하는 몇 가지 방법이 있는데, 여기서는 Tweedie’s formula를 이용한 방법을 살펴볼 것이다. 우선, 다음과 같은 objective를 생각해보자.

Untitled

$\mathbf {h}$는 임의의 vector-valued neural net이다. Eq 5를 minimize 하는 것은 corrupted sample $\tilde x$를 보고 $x$를 맞추는 least square 문제를 푸는 것과 같고, 따라서 optimal solution은 $\mathbf {h} (\tilde x; \theta)=E[x | \tilde x]$가 된다. 즉, 주어진 noisy sample $\tilde x$를 만들었을 clean image의 평균을 $\mathbf {h}$가 예측하도록 하는 문제인 것이다. 이 posterior expectation은 Tweedie’s formula를 이용해 다음과 같은 closed form solution을 구할 수 있다.

Untitled

$\mathbf {\tilde s}_1(\tilde x)$는 true score이다. 즉, eq 5의 optima에서 $\mathbf {h} (\tilde x; \theta)= \tilde x + \sigma^2 \mathbf {\tilde s}_1(\tilde x)$가 된다는 것이다. 이 때 우리가 $\mathbf{h}(\tilde x; \theta) = \tilde x + \sigma^2 \mathbf{\tilde s}_1(\tilde x;\theta)$라고 두면, optima에서 $\mathbf {\tilde s}_1(\tilde x; \theta)=\mathbf {\tilde s}_1(\tilde x)$가 되어 score를 성공적으로 학습할 수 있게 된다. 정리하면, 1) 어떤 least square problem의 solution은 posterior expectation 형태로 나타나고 2) 그것은 Tweedie’s formula에 의해 first order score를 포함한 polynomial이며 3) 이 polynomial의 형태를 그대로 따라해서 우리의 $\mathbf h$를 parameterize하면 score를 학습할 수 있다. 이 논리의 흐름은 앞으로도 계속 사용된다.

참고로, $\mathbf{h}(\tilde x; \theta) = \tilde x + \sigma^2 \mathbf{\tilde s}_1(\tilde x;\theta)$를 eq 5에 plug-in 하면 다음과 같이 eq 3 (DSM objective)과 equivalent 해진다.

Untitled

Second order denoising score matching

이제 이걸 second order로 확장해보자. Tweedie’s formula를 적절히 잘 이용하면 다음과 같은 Theorem에 도달할 수 있다고 한다 (증명 생략).

Untitled

우리가 여기서 기뻐해야 할 사실은, eq 7 and 8과 같은 posterior expectation에 대한 closed form solution인 eq 9 and 10이 first and second order score를 포함하고 있는 polynomial이라는 것이다 (여기서 eq 7과 8에 등장하는 $xx^T$를 포함하는 복잡한 형태의 식은 eq 9와 10이 우리가 바라는 score들을 포함하는 polynomial로 나오게끔 적절히 manipulate 하다가 나온 것 같다). 그 말은, solution이 eq 7이나 8처럼 나오는 least square objective를 적절히 설계한 다음, 어떤 함수 (위에선 $\mathbf h$)를 eq 9나 10의 형태로 parameterize하면, 그 least square objective 만으로 first / second order score를 학습할 수 있다는 것이기 때문이다 (여기서는 first order score는 고정하고 second order score만 학습하였다). 구체적으로 그 objective는 다음과 같다.

Untitled

Eq 11의 solution은 eq 7이고, eq 12의 solution은 eq 8이다. Eq 10이 더 단순한 관계로 실험에서는 eq 12 objective를 사용했다. 정리하면,

Eq 12를 minimize 하는 것은 $\mathbf {h} (\tilde x, \mathbf{\tilde s}_1, \mathbf {\tilde s}_2(;\theta)) = \mathbf{E}[xx^T - x\tilde x^T - \tilde x^T | \tilde x]$가 되도록 하는 것과 같다 (여기와 같은 논리).
Theorem 1에 의해, 그 말은 즉 $\mathbf {h} (\tilde x, \mathbf{\tilde s}_1, \mathbf {\tilde s}_2) = \mathbf {h} (\tilde x, \mathbf{\tilde s}_1, \mathbf {\tilde s}_2(;\theta))$가 됨을 의미한다.
그건 즉 $\mathbf {\tilde s}_2(;\theta) = \mathbf {\tilde s}_2$가 됨을 의미한다.