Numerical Stability with Smaller Weights

1. Gradient Stability During Backpropagation

During backpropagation, gradients flow backward through the network by multiplying through each layer's weights. Consider a simple 3-layer network:

Gradient flow: ∂Loss/∂w₁ involves multiplying w₃ × w₂ × ...

With large weights (say w = 10 in each layer):

With small weights (say w = 0.5 in each layer):

When gradients explode (become very large), your parameter updates during training become erratic:

With stable gradients from smaller weights, updates are proportional and predictable, leading to smooth convergence.

Consider a simple neuron: output = w₁x₁ + w₂x₂ + ... + bias

With large weights (w₁ = 100, w₂ = 150):