Architecture
Mixtral of Experts
Mistral AI·January 8, 2024
Albert Q. Jiang, Alexandre Sablayrolles
View on arXivTL;DR
Introduces Mixtral, a sparse mixture-of-experts model that matches much larger dense models while activating only a fraction of its parameters per token.
Why it matters
A landmark open MoE model that showed sparse architectures could deliver frontier quality at much lower inference cost.