hengtao tantaiMambaOut:Rethinking State-Space Models in the Context of Image ClassificationThe paper presents a theoretical analysis to determine the suitability of Mamba architecture for vision tasks, highlighting two key…4 min read·17 hours ago----
hengtao tantaiDeep Dive into xLSTM: The Evolution of LSTM Architecture and PyTorch Code ImplementationLong Short-Term Memory (LSTM) networks have been a staple in handling sequential data due to their ability to retain information over long…14 min read·May 20, 2024----
hengtao tantaiFrom Smooth Chatting to Precise Execution: differences between “chat” and “instruct” modes in…In the evolution of artificial intelligence, large language models (LLMs) have become an indispensable component, widely and profoundly…8 min read·May 16, 2024----
hengtao tantaiGradformer:The Graph Transformer enhances self-attention by graph structure Inductive BiasThe paper “Gradformer: Graph Transformer with Exponential Decay” introduces several innovations:5 min read·May 2, 2024----
hengtao tantaiOpenELM : Apple’s open source language modelsApple releases OpenELM including Openelm-270m, OpenelM-450m, Openelm-1b and Openelm-3B3 min read·Apr 25, 2024----
hengtao tantaiMicrosoft’s Phi-3: 3.8 Million Parameters, Rivaling Mixtral 8x7B and GPT-3.5Microsoft’s Phi-3: 3.8 Million Parameters, Rivaling Mixtral 8x7B and GPT-3.5, Running Directly on iPhone6 min read·Apr 23, 2024----
hengtao tantaiOptimizing Language Model Preferences Without a Reference Model: Introducing the ORPO MethodIn the ever-evolving field of artificial intelligence, refining language models to align with human preferences remains a critical…3 min read·Apr 15, 2024--1--1
hengtao tantaiWhat is Safetensors and how to convert .ckpt model to .safetensorsIf you often download model weight file, you will often see the .safetensors file."Safetensors" is a new file format for storing tensors…5 min read·Apr 7, 2023--3--3
hengtao tantaiAccelerating Model inference with TensorRT: Tips and Best Practices for PyTorch UsersTensorRT is a high-performance deep-learning inference library developed by NVIDIA. It is designed to optimize and accelerate the inference…10 min read·Apr 1, 2023--2--2
hengtao tantaiMETA’s LLaMA Released and LeakedThe leaked language model was posted to 4chan.2 min read·Mar 9, 2023----