Sign in Get started

FriendliAI Tech & Research Bog

Supercharge building and serving generative AI

Deploying Weights & Biases Model Checkpoints on Friendli Dedicated Endpoints

Deploying Weights & Biases Model Checkpoints on Friendli Dedicated Endpoints

In this blog post, we’ll be exploring our new exciting integration feature between Weights & Biases (W&B) and Friendli Dedicated Endpoints…

FriendliAI Tech & Research

Jul 25

Deploying Your Inference Endpoints on AWS Sagemaker with Friendli Container

Deploying Your Inference Endpoints on AWS Sagemaker with Friendli Container

This blog post will guide you through creating an Amazon SageMaker Model from model artifacts stored in an S3 bucket and leveraging…

FriendliAI Tech & Research

Jul 25

Introducing Structured Output on Friendli Engine for Building LLM Agents

Introducing Structured Output on Friendli Engine for Building LLM Agents

Large language models (LLMs) excel at creative text generation, but we often face a case where we need LLM outputs to be more structured…

FriendliAI Tech & Research

Jun 18

Measuring LLM Serving Performance with LLMServingPerfEvaluator

Measuring LLM Serving Performance with LLMServingPerfEvaluator

Tired of Fixing LLM Serving Benchmarks?

FriendliAI Tech & Research

Jun 4

Building a RAG Chatbot with Friendli, MongoDB Atlas, and LangChain

Building a RAG Chatbot with Friendli, MongoDB Atlas, and LangChain

Recently, LangChain introduced support for Friendli as an LLM inference serving engine. This integration allows you to leverage Friendli…

FriendliAI Tech & Research

May 27

Meta Llama 3 now available on Friendli

Meta Llama 3 now available on Friendli

At FriendliAI, we’re on a mission to democratize access to cutting-edge generative AI models. That’s why we’re thrilled to announce the…

FriendliAI Tech & Research

May 27

Easily Migrating LLM Inference Serving from vLLM to Friendli Container

Easily Migrating LLM Inference Serving from vLLM to Friendli Container

vLLM is an open-source inference engine that provides a starting point for serving your large language models (LLMs). However, when it…

FriendliAI Tech & Research

May 16

Building Your RAG Application on LlamaIndex with Friendli Engine: A Step-by-Step Guide

Building Your RAG Application on LlamaIndex with Friendli Engine: A Step-by-Step Guide

So you’re ready to delve into the exciting world of Retrieval-Augmented Generation (RAG)? While the possibilities are endless, choosing the…

FriendliAI Tech & Research

Apr 17

Improve Latency and Throughput with Weight-Activation Quantization in FP8

Improve Latency and Throughput with Weight-Activation Quantization in FP8

Quantization is a popular technique used to reduce the size of a machine learning model by lowering the numerical precision of some of its…

FriendliAI Tech & Research

Apr 14

Running Quantized Mixtral 8x7B on a Single GPU

Running Quantized Mixtral 8x7B on a Single GPU

Building on our previous article, let’s revisit the power of Mixture of Experts (MoE). Mixtral, an MoE model from Mistral AI, enables the…

FriendliAI Tech & Research

Apr 4

About FriendliAILatest StoriesArchiveAbout MediumTermsPrivacyTeams