Architecture
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
KAIST AI / Google DeepMind / Mila·July 14, 2025
Sangmin Bae, Yujin Kim
View on arXivTL;DR
Reuses a stack of shared layers recursively and adds lightweight per-token routers that assign each token its own recursion depth, spending more compute only on harder tokens.
Why it matters
A clean unification of parameter sharing and adaptive depth that defines a new efficiency Pareto frontier (NeurIPS 2025).