Residual Connections
Skip connections that let very deep networks train.
A residual (or skip) connection adds a layer’s input to its output, giving gradients a shortcut to flow backward through very deep networks. Introduced by ResNet, they made networks hundreds of layers deep trainable and are now standard in Transformers.