Mximum Log Liklihood

$L(\theta|x)=p(x|\theta)$

$L(\theta|x)$ is not the parameter itself, but a function that evaluates how "likely" a given $\theta$ is, considering the observed data x. It is called LIklihood

$p(x|\theta)$ is the Generative View. It treats $\theta$ as a fixed law of nature and asks for the probability of observing data $x$

To maximize the Liklihood is our goal.

What is $p(x|\theta)$

Let’s set $\hat x$ as model’s output, and error between predicted $\hat x$ and $x$ is folloing Gaussian distribution $\mathcal N(0,\sigma^2)$

$x = \hat x+\epsilon,~~where~\epsilon\sim\mathcal{N}(0,\sigma^2)$

$x-\hat x=\epsilon\sim\mathcal{N}(0,\sigma^2)$

Then, turn it in its mathematical form:

$p(\epsilon) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left( -\frac{\epsilon^2}{2\sigma^2} \right)$$,p(x - \hat{x}) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left( -\frac{(x - \hat{x})^2}{2\sigma^2} \right)$

$p(x - \hat{x}) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left( -\frac{(x - \hat{x})^2}{2\sigma^2} \right)=p(x|\theta)$

Here, $x-\hat x$ is determined by $\theta$ so,

$p(x - \hat{x})=p(x|\theta) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left( -\frac{(x - \hat{x})^2}{2\sigma^2} \right)$

Just concept for understanding. $p(x|\theta)$ is not like those chain form. It is just metaphor.

$p(x|\hat x)p(\hat x|z)p(z|\theta)=p(x|\theta)$

And take log both side,(easy to calculate)

$\log L(\theta|x)=\log p(x|\theta)$

$\log L(\theta|x) = \log \left( \frac{1}{\sqrt{2\pi\sigma^2}} \right) - \frac{(x - \hat{x})^2}{2\sigma^2}$