Skip Connections

deep nn, 2 problems -

updates we do to the later layers aren’t very meaningful - because the inputs are too scrambled as so many random weight multiplications and activations to the actual input has turned it into a noise and the output/last layers has almost no signal/relevance to input layers
updates we do at the early layer also aren’t very meaningful - because the gradients are also too scrambled by many multiplications

we would like to -

create a way where the inputs can arrive at the later layers and make the inputs meaningful
loss gradients to arrive at early layer and make their updates more meaningful

Residual Blocks - a collection of layers where data goes both - through and around using skip connections