Expectation-Maximization algorithm on ELBO

Introduction


EM (Expectation-maximizing); used when we do not have a complete data set of observations from X (a sample with conditional density), so we go like, ah what are the chances we see these things happen given there are things that haven't happened yet!

Again, not quite approximating inference, but rather to learn with an approximate posterior.

There are many algorithms that take the shape of a 2-step process per iteration, and they come with the con of mutual reliance.

Procedure


Recall that the ELBO introduces the distribution q for us to estimate on top on the set of model parameters theta.

The EM procedure takes turns to make updates to q and theta. In summary,

alternated until convergence.

Can be thought of as a coordinate ascent procedure to maximize L.

More specifically, at every step, we are guaranteed to make some improvements to L — eventually we are guaranteed to converge to a spot with 0 gradient (just like gradient ascent, which is a special case of this where M step is small).

Remarks