AI Hub
All papers
Architecture

Mixtral of Experts

Mistral AI·January 8, 2024

Albert Q. Jiang, Alexandre Sablayrolles

View on arXiv

TL;DR

Introduces Mixtral, a sparse mixture-of-experts model that matches much larger dense models while activating only a fraction of its parameters per token.

Why it matters

A landmark open MoE model that showed sparse architectures could deliver frontier quality at much lower inference cost.

Related terms