Homepage
Open in app
Sign in
Get started
FriendliAI Tech & Research Bog
Supercharge building and serving generative AI
Blog
About
Friendli.AI
Follow
Deploying Weights & Biases Model Checkpoints on Friendli Dedicated Endpoints
Deploying Weights & Biases Model Checkpoints on Friendli Dedicated Endpoints
In this blog post, we’ll be exploring our new exciting integration feature between Weights & Biases (W&B) and Friendli Dedicated Endpoints…
FriendliAI Tech & Research
Jul 25
Deploying Your Inference Endpoints on AWS Sagemaker with Friendli Container
Deploying Your Inference Endpoints on AWS Sagemaker with Friendli Container
This blog post will guide you through creating an Amazon SageMaker Model from model artifacts stored in an S3 bucket and leveraging…
FriendliAI Tech & Research
Jul 25
Introducing Structured Output on Friendli Engine for Building LLM Agents
Introducing Structured Output on Friendli Engine for Building LLM Agents
Large language models (LLMs) excel at creative text generation, but we often face a case where we need LLM outputs to be more structured…
FriendliAI Tech & Research
Jun 18
Measuring LLM Serving Performance with LLMServingPerfEvaluator
Measuring LLM Serving Performance with LLMServingPerfEvaluator
Tired of Fixing LLM Serving Benchmarks?
FriendliAI Tech & Research
Jun 4
Building a RAG Chatbot with Friendli, MongoDB Atlas, and LangChain
Building a RAG Chatbot with Friendli, MongoDB Atlas, and LangChain
Recently, LangChain introduced support for Friendli as an LLM inference serving engine. This integration allows you to leverage Friendli…
FriendliAI Tech & Research
May 27
Meta Llama 3 now available on Friendli
Meta Llama 3 now available on Friendli
At FriendliAI, we’re on a mission to democratize access to cutting-edge generative AI models. That’s why we’re thrilled to announce the…
FriendliAI Tech & Research
May 27
Easily Migrating LLM Inference Serving from vLLM to Friendli Container
Easily Migrating LLM Inference Serving from vLLM to Friendli Container
vLLM is an open-source inference engine that provides a starting point for serving your large language models (LLMs). However, when it…
FriendliAI Tech & Research
May 16
Building Your RAG Application on LlamaIndex with Friendli Engine: A Step-by-Step Guide
Building Your RAG Application on LlamaIndex with Friendli Engine: A Step-by-Step Guide
So you’re ready to delve into the exciting world of Retrieval-Augmented Generation (RAG)? While the possibilities are endless, choosing the…
FriendliAI Tech & Research
Apr 17
Improve Latency and Throughput with Weight-Activation Quantization in FP8
Improve Latency and Throughput with Weight-Activation Quantization in FP8
Quantization is a popular technique used to reduce the size of a machine learning model by lowering the numerical precision of some of its…
FriendliAI Tech & Research
Apr 14
Running Quantized Mixtral 8x7B on a Single GPU
Running Quantized Mixtral 8x7B on a Single GPU
Building on our previous article, let’s revisit the power of Mixture of Experts (MoE). Mixtral, an MoE model from Mistral AI, enables the…
FriendliAI Tech & Research
Apr 4
About FriendliAI
Latest Stories
Archive
About Medium
Terms
Privacy
Teams