Architecture
Deep Residual Learning for Image Recognition
Microsoft·December 10, 2015
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
View on arXivTL;DR
Introduces residual (skip) connections, enabling networks hundreds of layers deep to train.
Why it matters
Residual connections are now everywhere, including inside every Transformer block.