AI Hub
All papers
ArchitectureReinforcement LearningEvaluation

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

MiniMax AI·June 16, 2025

MiniMax

View on arXiv

TL;DR

The first open-weight large-scale hybrid-attention reasoning model: a 456B MoE (45.9B active) interleaving linear “lightning” attention with softmax attention to make long-context test-time compute far cheaper. Introduces the CISPO RL algorithm.

Why it matters

Showed that linear/lightning attention is viable at frontier scale for reasoning and that hybrid attention sharply cuts the cost of long reasoning generations.

Related models

Related terms