Summaries, Math, and Definitions
Seen as an extension of E-M, MAP. Recall that ELBO can be written as
$$ L(\theta, q, \vec{v}) = \log p(\vec{v}) - D_{KL}(q(\theta)|p(\theta|\vec{v})) $$
where D-KL refers to Kullbacl-Leibler divergence, which variational inference methods aim to minimize, so that the ELBO is as close to log p as possible.
*Note that this KL-divergence direction is the opposite of maximum likelihood but has good computational properties. As a result, our approximation here encourages q to have low probability where p has low probability.
Use Mean Field Approximation, meaning that q is assumed to be fully factorized. This means an explicitly parametrized, and that because
$$ q(\theta) = \prod^D_{i=1} q_i(\theta_i) $$
we can have a structured variational inference (see paper in resources) ****where
*I strongly recommend reading from Slides 21 onwards, goes to sampling methods:
The beauty of the variational approach is that we do not need to specify a specific parametric form for q. We specify how it should factorize, but then the optimization problem determines the optimal probability distribution within those factorization constraints. For discrete latent variables, this just means that we use traditional optimization techniques to optimize a finite number of variables describing the q distribution. For continuous latent variables, this means that we use a branch of mathematics called calculus of variations to perform optimization over a space of functions and actually determine which function should be used to represent q.
Define q as a lookup table over discrete states, and then optimize for q's parameters.
Use a fixed point equation to satisfy the optimizer speed requirement, effectively solving for h-hat, the lookup table values, in
$$ {\delta \over \delta \hat{h}_i} \mathbb{L} = 0 $$
Binary Sparse Coding Model walkthrough; see original paper here:
Apologies if this part wasn't very clear in the talk: the math eventually leads to L being arithemetically computable. In the end, we can see sparse coding as an iterative autoencoder.