ArchitectureTraining
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Model
DeepSeek·May 7, 2024
DeepSeek-AI
View on arXivTL;DR
Introduces multi-head latent attention for cheap inference, undercutting the market on price.
Why it matters
Foreshadowed the V3 and R1 breakthroughs and DeepSeek’s focus on training/inference efficiency.