deep nn, 2 problems -

  1. updates we do to the later layers aren’t very meaningful - because the inputs are too scrambled as so many random weight multiplications and activations to the actual input has turned it into a noise and the output/last layers has almost no signal/relevance to input layers
  2. updates we do at the early layer also aren’t very meaningful - because the gradients are also too scrambled by many multiplications

we would like to -

  1. create a way where the inputs can arrive at the later layers and make the inputs meaningful
  2. loss gradients to arrive at early layer and make their updates more meaningful

Residual Blocks - a collection of layers where data goes both - through and around using skip connections

detailed - https://www.youtube.com/watch?v=Q1JCrG1bJ-A