State Space Models
Sequence models with linear-time scaling, an alternative to attention.
State space models (such as Mamba) process sequences with a recurrent, convolution-like mechanism that scales linearly with length, unlike attention’s quadratic cost. They are a promising direction for very long contexts and efficient inference.