Peiyuan Chien (Chris)Here We Give You All the Tips to Run LLM Inference SmoothlyLLM Inference Optimization1d ago
João Paulo FigueirainTowards Data ScienceMap-Matching for Speed PredictionHow fast will you drive?Jan 192
MahernaijaThe Best NVIDIA GPUs for LLM Inference: A Comprehensive GuideLarge Language Models (LLMs) like GPT-4, BERT, and other transformer-based models have revolutionized the AI landscape. These models demand…Aug 276Aug 276
Alon AgmoninTowards Data ScienceStreamlining Serverless ML Inference: Unleashing Candle Framework’s Power in RustBuilding a lean and robust model serving layer for vector embedding and search with Hugging Face’s new Candle FrameworkDec 21, 20231Dec 21, 20231
Péter HarangSetting up AWS Bedrock for API-based text inferenceLast time I struggled with WatsonX, now let’s check whether AWS is better or worse in this regard. Let’s do a speed-run!May 29May 29
Peiyuan Chien (Chris)Here We Give You All the Tips to Run LLM Inference SmoothlyLLM Inference Optimization1d ago
João Paulo FigueirainTowards Data ScienceMap-Matching for Speed PredictionHow fast will you drive?Jan 192
MahernaijaThe Best NVIDIA GPUs for LLM Inference: A Comprehensive GuideLarge Language Models (LLMs) like GPT-4, BERT, and other transformer-based models have revolutionized the AI landscape. These models demand…Aug 276
Alon AgmoninTowards Data ScienceStreamlining Serverless ML Inference: Unleashing Candle Framework’s Power in RustBuilding a lean and robust model serving layer for vector embedding and search with Hugging Face’s new Candle FrameworkDec 21, 20231
Péter HarangSetting up AWS Bedrock for API-based text inferenceLast time I struggled with WatsonX, now let’s check whether AWS is better or worse in this regard. Let’s do a speed-run!May 29
Sneha GhantasalainThomson Reuters LabsTensor Parallel LLM InferencingAs models increase in size, it becomes impossible to fit them in a single GPU for inference. There are different types of model parallelism…3d ago
Fireworks.aiFireworks Raises the Quality Bar with Function Calling Model and API ReleaseFireworks conducts alpha launch of our function calling model and API, with quality reaching GPT-4 and surpassing open-source modelsDec 20, 20231
Jingying HBeam SearchBeam search is an advanced decoding algorithm used in natural language processing to generate sequences, such as sentences, from a model…Jun 8