Bias and Variance

Let’s look at the definitions of bias and variance. What are bias and variance? Let's start with the words themselves.

Different Representations of Bias

Colloquially: Bias is about being consistently off in the same way. It means that we are being systematically prejudiced or leaning in one direction. For example, imagine shooting arrows with your sight misaligned: all your shots cluster tightly, but the whole cluster lands to the right of the bullseye. No matter how many times you shoot, you’ll keep missing in that direction. The key word is systematic: bias isn’t random error, but a consistent distortion in one direction that arises from the assumptions built into a model.

Mathematically: Bias is the difference between where our estimator is centered (its expectation across all possible samples) and the true parameter value. Bias is defined as:

$$ Bias(\hat{\theta}) := E_\theta[\hat{\theta}] - \theta \tag{2}

Here $\hat{\theta}$ is the estimator (like a sample mean), $E_\theta[\hat{\theta}]$ is the expected value of our estimator across all possible samples, and $\theta$ is the true parameter.

Different Representations of Variance

Colloquially: variance explains variation, spread, and inconsistency. Basically, variance shows how sensitive our model is to the specific data we happened to see. If it's too specific, we probably "overfit," meaning that our model tracks too many nuances that might not be as generalizable. For example, imagine your sight is perfectly aligned, but your hand shakes every time you release an arrow. Sometimes you hit left, sometimes right, sometimes high or low. On average, your shots center on the bullseye, but they’re scattered all over the target. That spread is variance — it shows how sensitive your estimator is to the specific data you happen to see. The key word here is "sensitivity." Meaning that small changes in our data lead to big changes in our model; it's too dependent on the specific sample we happened to observe.

Mathematically: variance is the average squared distance of our estimator from its own expected value across all possible samples. Variance is defined as:

$$ Var(\hat{\theta}) := E_\theta[(\hat{\theta} - E_\theta[\hat{\theta}])^2] \tag{3} $$

Again, $\hat{\theta}$ is the estimator, $E_\theta[\hat{\theta}]$ is its expectation, and $\theta$ is the true parameter value. And variance measuring how much $\hat{\theta}$ deviates from its own center on average, how much it fluctuates around that expectation.

How They Combine

These two sources of error — bias and variance — add together to explain our total miss distance. Going back to our earlier archery analogy:

Bias is how far the center of our shot pattern is from the bullseye.
Variance is how widely the arrows scatter around that center.

When we measure mean squared error, we’re really measuring both at the same time: the spread of our estimator around its center (variance) and how far that center itself is from the truth (bias). In other words, how far our estimator tends to drift away from the truth (bias), and how much it wobbles around its own center (variance). MSE captures the whole picture because both effects contribute to how far off we are from reality.

Notice back again at the definition of MSE:

$$ MSE(\hat{\theta}) := E_\theta[(\hat{\theta} - \theta)^2] \tag{1} $$

MSE is defined in terms of squared error.

Variance already lives in this squared world — it’s defined as squared deviations from the estimator’s mean: