PinnedPublished inTDS ArchiveDeepSeek-V3 Explained 1: Multi-head Latent AttentionKey architecture innovation behind DeepSeek-V2 and DeepSeek-V3 for faster inferenceJan 31A response icon12Jan 31A response icon12
PinnedPublished inTDS ArchiveUnderstanding the Evolution of ChatGPT: Part 1—An In-Depth Look at GPT-1 and What Inspired ItTracing the roots of ChatGPT: GPT-1, the foundation of OpenAI’s LLMsJan 6A response icon1Jan 6A response icon1
DeepSeek Explained 8: Post-Training of DeepSeek-V3This is the last article of our DeepSeek series ([1], [2]), finally we will cover the post-training techniques in DeepSeek-V3 [2].Apr 7Apr 7
Published inData Science CollectiveDeepSeek-R1: Advancing LLM Reasoning with Reinforcement LearningThis is the seventh article in our DeepSeek series [1], where we will break down how DeepSeek-R1 is trained by exploring large-scale…Apr 1A response icon2Apr 1A response icon2
Published inData Science CollectiveDeepSeek Explained 6: All you need to know about Reinforcement Learning in LLM trainingThis is the sixth article in our DeepSeek series, where we will dive deeper into one of the key innovations in training strategies of…Mar 18Mar 18
Published inData Science CollectiveDeepSeek Explained 5: DeepSeek-V3-BaseInnovations in pre-training strategies of DeepSeek-V3.Mar 10Mar 10
Published inData Science CollectiveDeepSeek Explained 4: Multi-Token PredictionHow DeepSeek achieves better balance between efficiency and quality in text generationFeb 20Feb 20
Paper Explained 3: E5How a simple architecture is transformed into a SOTA embedding modelFeb 18Feb 18
Published inAI AdvancesDeepSeek-V3 Explained 3: Auxiliary-Loss-Free Load BalancingHow DeepSeek breaks the hidden bottleneck in MoEFeb 10Feb 10