AI Hub
All papers
Architecture

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

KAIST AI / Google DeepMind / Mila·July 14, 2025

Sangmin Bae, Yujin Kim

View on arXiv

TL;DR

Reuses a stack of shared layers recursively and adds lightweight per-token routers that assign each token its own recursion depth, spending more compute only on harder tokens.

Why it matters

A clean unification of parameter sharing and adaptive depth that defines a new efficiency Pareto frontier (NeurIPS 2025).

Related terms