TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

The Three Essential Methods to Evaluate a New Language Model

How to check whether the newest, hottest Large Language Model (LLM) fits your needs

Heiko Hotz
TDS Archive
Published in
6 min readJul 3, 2023

--

Image by author (using Stable Diffusion)

What is this about?

New LLMs are released every week, and if you’re like me, you might ask yourself: Does this one finally fit all the use cases I want to utilise an LLM for? In this tutorial, I will share the techniques that I use to evaluate new LLMs. I’ll introduce three techniques I use regularly — none of them are new (in fact, I will refer to blog posts that I have written previously), but by bringing them all together, I save a significant amount of time whenever a new LLM is released. I will demonstrate examples of testing on the new OpenChat model.

Why is this important?

When it comes to new LLMs, it’s important to understand their capabilities and limitations. Unfortunately, figuring out how to deploy the model and then systematically testing it can be a bit of a drag. This process is often manual and can consume a lot of time. However, with a standardised approach, we can iterate much faster and quickly determine whether a model is worth investing more time in, or if we should discard it. So, let’s get started.

Getting Started

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Heiko Hotz
Heiko Hotz

Written by Heiko Hotz

Generative AI Blackbelt @ Google — All opinions are my own

Responses (2)