PinnedJeffrey IpLLM Evaluation Metrics: Everything You Need for LLM EvaluationAlthough evaluating the outputs of Large Language Models (LLMs) is essential for anyone looking to ship robust LLM applications, LLM…19 min read·Jan 22, 2024--3--3
PinnedJeffrey IpA Step-By-Step Guide to Evaluating an LLM Text Summarization TaskWhen you imagine what a good summary for a 10-page research paper looks like, you likely picture a concise, comprehensive overview that…6 min read·Dec 18, 2023--1--1
PinnedJeffrey IpHow to Evaluate LLM ApplicationsChatGPT, the leading code generator, has exploded in popularity over the past year thanks to the seemingly omniscient GPT-4. Its ability to…9 min read·Nov 9, 2023--2--2
PinnedJeffrey IpWhy we replaced Pinecone with PGVectorPinecone, the leading closed-source vector database provider, is known for being fast, scalable, and easy to use. Its ability to allow…5 min read·Oct 31, 2023--10--10
Jeffrey IpHow to Build an LLM Evaluation Framework, from ScratchLet’s set the stage: I’m about to change my prompt template for the 44th time when I get a message from my manager: “Hey Jeff, I hope…11 min read·Apr 8, 2024--1--1
Jeffrey IpLLM Testing in 2024: Top Methods and StrategiesJust a week ago, I was on a call with a DeepEval user who told me she considers testing and evaluating large language models (LLMs) as…9 min read·Feb 26, 2024--2--2
Jeffrey IpThe Ultimate Guide to Fine-Tune LLaMA 2, With EvaluationsFine-tuning a Large Language Model (LLM) comes with tons of benefits when compared to relying on proprietary foundational models such as…10 min read·Feb 21, 2024----
Jeffrey IpLLM Benchmarking: Evaluating LLMs in 2024Picture LLMs ranging from 7 billion to over 100 billion parameters, each more powerful than the last. Among them are the giants: Mistral 7…12 min read·Jan 8, 2024--3--3
Jeffrey IpWhy OpenAI Assistants is a Big Win for LLM EvaluationA week after the famous, or infamous, OpenAI Dev Day, we at Confident AI released JudgementalGPT — an LLM agent built using OpenAI’s…5 min read·Nov 22, 2023--4--4
Jeffrey IpWhat is Retrieval Augmented Generation (RAG)?Large-language models like ChatGPT are powerful and versatile generators of natural language, but also extremely limited by the the data…6 min read·Oct 25, 2023--1--1