hengtao tantaiDynamic Learning Redefining Sequence Modeling: TTT (Test-Time Training) Unleashing Power for…The paper introduces Test-Time Training (TTT) layers as a novel class of sequence modeling layers, designed to enhance the expressive power…6d ago6d ago
hengtao tantaiLongRAG: Mastering Complex Queries and Unlocking High-Efficiency Question Answering with Extended…“LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs” presents the introduction of LongRAG, a new framework designed…Jul 7Jul 7
hengtao tantaiMixture of Attention:Optimizing Large Language Models with Adaptive Attention MechanismsThe paper introduces a novel approach called Mixture of Attention (MoA) to enhance the performance and efficiency of Large Language Models…Jul 6Jul 6
hengtao tantaiMeta Releases LLM Compiler: A Compiler Based on Large Language Models with Both Compilation and…In a significant stride towards enhancing compiler optimization, Meta AI has introduced a groundbreaking suite of tools termed the LLM…Jun 29Jun 29
hengtao tantaiNVIDIA’s LLM: Nemotron-4 340B Trained on 98% Synthetic Data, Surpasses Rivals and Matches GPT-4NVIDIA has launched its groundbreaking open-source model, Nemotron-4 340B, potentially revolutionizing the training of large language…Jun 16Jun 16
hengtao tantaiMamba-2 Innovation: State Space Expanded by 8x and Training Speed Increased by 50%, Structured…The paper titled “Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality” explores the…Jun 10Jun 10
hengtao tantaiMambaOut:Rethinking State-Space Models in the Context of Image ClassificationThe paper presents a theoretical analysis to determine the suitability of Mamba architecture for vision tasks, highlighting two key…May 28May 28
hengtao tantaiDeep Dive into xLSTM: The Evolution of LSTM Architecture and PyTorch Code ImplementationLong Short-Term Memory (LSTM) networks have been a staple in handling sequential data due to their ability to retain information over long…May 201May 201
hengtao tantaiFrom Smooth Chatting to Precise Execution: differences between “chat” and “instruct” modes in…In the evolution of artificial intelligence, large language models (LLMs) have become an indispensable component, widely and profoundly…May 16May 16
hengtao tantaiGradformer:The Graph Transformer enhances self-attention by graph structure Inductive BiasThe paper “Gradformer: Graph Transformer with Exponential Decay” introduces several innovations:May 2May 2