deep nn, 2 problems -
- updates we do to the later layers aren’t very meaningful - because the inputs are too scrambled as so many random weight multiplications and activations to the actual input has turned it into a noise and the output/last layers has almost no signal/relevance to input layers
- updates we do at the early layer also aren’t very meaningful - because the gradients are also too scrambled by many multiplications
we would like to -
- create a way where the inputs can arrive at the later layers and make the inputs meaningful
- loss gradients to arrive at early layer and make their updates more meaningful
Residual Blocks - a collection of layers where data goes both - through and around using skip connections
detailed - https://www.youtube.com/watch?v=Q1JCrG1bJ-A