AI Hub
All papers
ArchitectureTraining

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Model

DeepSeek·May 7, 2024

DeepSeek-AI

View on arXiv

TL;DR

Introduces multi-head latent attention for cheap inference, undercutting the market on price.

Why it matters

Foreshadowed the V3 and R1 breakthroughs and DeepSeek’s focus on training/inference efficiency.

Related models

Related terms