ArchitectureTraining
DeepSeek-V3 Technical Report
DeepSeek·December 27, 2024
DeepSeek-AI
View on arXivTL;DR
Describes DeepSeek-V3, a 671B-parameter MoE model trained at a fraction of typical frontier cost, with detailed efficiency and training innovations.
Why it matters
Its training-efficiency claims reframed the economics of frontier models and intensified debate over how much compute capability really requires.