KL-D

ELBO

Entropy

Before we start, we have to know about entropy first.

Suprizal = $-\log p(x)$

Entropy = $-\Sigma p(x)\log p(x)$

Cross Entropy=$-\Sigma p(x) \log q(x)$

KL-Divergence=$-\Sigma p(x) \log q(x)-(-\Sigma p(x)\log p(x))=\Sigma p(x)\log \frac{p(x)}{q(x)}$

Surprizal !

The definition $I(x) = -\log p(x)$ originates from two core requirements: Monotonicity and Additivity.

$p(A \cap B) = p(A) \cdot p(B)$, $\log(p(A) \cdot p(B)) = \log p(A) + \log p(B)$

  1. Inverse Relationship: Surprise must be inversely proportional to probability; the rarer the event, the higher the information content.
  2. Additivity: When two independent events occur, their joint probability is multiplicative $(p(A)p(B))$, but their total surprise should be additive. The logarithm is the unique mathematical bridge that transforms this product into a sum, mapping the probability space onto a linear information scale.

Shannon’s Entropy(From here we concern p(x) as probability distribution)

$H(x) = \sum p(x) \underbrace{[-\log p(x)]}_{\text{Surprise}}$

Shannon Entropy is simply the average amount of surprise."

It quantifies the expected value of information we get from observing a random variable. In generative modeling, managing this "average surprise" is the key to balancing reconstruction accuracy and latent space continuity.

Cross Entropy

$H(p, q) = \mathbb{E}_{x \sim p} [-\log q(x)] = -\sum p(x) \log q(x)$

Cross Entropy: Explaining $p$ through $q$