hengtao tantai – Medium

hengtao tantai

hengtao tantai

Dynamic Learning Redefining Sequence Modeling: TTT (Test-Time Training) Unleashing Power for…

The paper introduces Test-Time Training (TTT) layers as a novel class of sequence modeling layers, designed to enhance the expressive power…

6d ago

Dynamic Learning Redefining Sequence Modeling: TTT (Test-Time Training) Unleashing Power for…

6d ago

hengtao tantai

LongRAG: Mastering Complex Queries and Unlocking High-Efficiency Question Answering with Extended…

“LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs” presents the introduction of LongRAG, a new framework designed…

Jul 7

LongRAG: Mastering Complex Queries and Unlocking High-Efficiency Question Answering with Extended…

Jul 7

hengtao tantai

Mixture of Attention：Optimizing Large Language Models with Adaptive Attention Mechanisms

The paper introduces a novel approach called Mixture of Attention (MoA) to enhance the performance and efficiency of Large Language Models…

Jul 6

Mixture of Attention：Optimizing Large Language Models with Adaptive Attention Mechanisms

Jul 6

hengtao tantai

Meta Releases LLM Compiler: A Compiler Based on Large Language Models with Both Compilation and…

In a significant stride towards enhancing compiler optimization, Meta AI has introduced a groundbreaking suite of tools termed the LLM…

Jun 29

Meta Releases LLM Compiler: A Compiler Based on Large Language Models with Both Compilation and…

Jun 29

hengtao tantai

NVIDIA’s LLM: Nemotron-4 340B Trained on 98% Synthetic Data, Surpasses Rivals and Matches GPT-4

NVIDIA has launched its groundbreaking open-source model, Nemotron-4 340B, potentially revolutionizing the training of large language…

Jun 16

NVIDIA’s LLM: Nemotron-4 340B Trained on 98% Synthetic Data, Surpasses Rivals and Matches GPT-4

Jun 16

hengtao tantai

Mamba-2 Innovation: State Space Expanded by 8x and Training Speed Increased by 50%, Structured…

The paper titled “Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality” explores the…

Jun 10

Mamba-2 Innovation: State Space Expanded by 8x and Training Speed Increased by 50%, Structured…

Jun 10

hengtao tantai

MambaOut:Rethinking State-Space Models in the Context of Image Classification

The paper presents a theoretical analysis to determine the suitability of Mamba architecture for vision tasks, highlighting two key…

May 28

MambaOut:Rethinking State-Space Models in the Context of Image Classification

May 28

hengtao tantai

Deep Dive into xLSTM: The Evolution of LSTM Architecture and PyTorch Code Implementation

Long Short-Term Memory (LSTM) networks have been a staple in handling sequential data due to their ability to retain information over long…

May 20

Deep Dive into xLSTM: The Evolution of LSTM Architecture and PyTorch Code Implementation

May 20

hengtao tantai

From Smooth Chatting to Precise Execution: differences between “chat” and “instruct” modes in…

In the evolution of artificial intelligence, large language models (LLMs) have become an indispensable component, widely and profoundly…

May 16

From Smooth Chatting to Precise Execution: differences between “chat” and “instruct” modes in…

May 16

hengtao tantai

Gradformer:The Graph Transformer enhances self-attention by graph structure Inductive Bias

The paper “Gradformer: Graph Transformer with Exponential Decay” introduces several innovations:

May 2

Gradformer:The Graph Transformer enhances self-attention by graph structure Inductive Bias

May 2

hengtao tantai

hengtao tantai

Independent Researcher.I post the AI content that I am interested in.Hope you like it too

Following

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams