Now that we have all the necessary key minimal components for decomposing MSE, we can actually decompose estimator MSE into bias and variance and their tradeoffs.
By definition, we have:
$$ MSE(\hat{\theta}) := E_\theta[(\hat{\theta} - \theta)^2] \tag{1} $$
Again, we want to decompose MSE in terms of Bias and Variance.
$$ MSE(\hat{\theta}) = Var(\hat{\theta}) + Bias(\hat{\theta})^2 \tag{4} $$
We need to think about what MSE as a measurement actually inform us. MSE is the sum of two distinct sources of error: variance (sensitivity to the data we happened to draw) and squared bias (systematic offset from the truth). This decomposition makes the tradeoff explicit: reducing bias often increases variance, and vice versa. What does this mean in practice? It means we want to transform the current expression into a form where we can clearly see how both the variance of our estimator and the squared bias contribute.
Let’s remind ourselves of the definitions:
$$ Bias(\hat{\theta}) := E_\theta[\hat{\theta}] - \theta \tag{2}
$$
$$ Var(\hat{\theta}) := E_\theta[(\hat{\theta} - E_\theta[\hat{\theta}])^2] \tag{3} $$
Basically, we want to go from equation (1) to equation (4) using the ingredients from equations (2) and (3). To think about how to show the derivation, we have to look at this problem step-by-step.
Let’s set equation (1) right hand side (RHS) to equation (4) RHS, we have:
$$ E_\theta[(\hat{\theta} - \theta)^2] = Var(\hat{\theta}) + Bias(\hat{\theta})^2 \tag{5} $$
When we decompose equation (5), we want the left hand side (LHS) to be in the form of bias and variance per the definitions from equations (2) and (3).
This mean that we want to change the form of the LHS such that it’s roughly in the form of variance and bias. Something we can do is to add 0 because it won’t change the value But what kind of 0 do we need? We add and subtract $E_\theta[\hat{\theta}]$ because it is the bridge between variance (spread around the estimator’s own mean) and bias (the offset of that mean from the truth). No other choice of term would give us exactly those two definitions when we expand. By inserting this inside the squared difference, we don’t alter the expression, but we create the pieces that look approximately like variance and bias.
$$ E_\theta[(\hat{\theta} - \theta)^2] = E_\theta[((\hat{\theta} - E_\theta[\hat{\theta}]) + ( E_\theta[\hat{\theta}] - \theta))^2] $$
Now that we have these terms, we can use algebra to expand. Recall $(a+b)^2 = a^2 + 2ab + b^2.$
Let $a = \hat{\theta} - E_\theta[\hat{\theta}]$ and $b = E_\theta[\hat{\theta}] - \theta$. Notice that we can set $E_\theta[\cdot]$ aside for now since we are trying to expand the inside of the square first, before applying the expectation.
Expanding the square, we have:
$$ (\hat{\theta} - \theta)^2 = (\hat{\theta} - E_\theta[\hat{\theta}])^2 + 2(\hat{\theta} - E_\theta[\hat{\theta}])(E_\theta[\hat{\theta}] - \theta) + (E_\theta[\hat{\theta}] - \theta)^2 $$
Now we bring the expectation back in. Remember that expectation is linear, so we can take it of each term separately. Also, constants can be pulled outside the expectation. That means the middle piece, $(E_\theta[\hat{\theta}] - \theta)$, comes out in front.