State Space Models

Sequence models with linear-time scaling, an alternative to attention.

State space models (such as Mamba) process sequences with a recurrent, convolution-like mechanism that scales linearly with length, unlike attention’s quadratic cost. They are a promising direction for very long contexts and efficient inference.

Related papers