ArchitectureTraining

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Model

DeepSeek·May 7, 2024

DeepSeek-AI

TL;DR

Introduces multi-head latent attention for cheap inference, undercutting the market on price.

Foreshadowed the V3 and R1 breakthroughs and DeepSeek’s focus on training/inference efficiency.