Member-only story
The Three Essential Methods to Evaluate a New Language Model
How to check whether the newest, hottest Large Language Model (LLM) fits your needs
What is this about?
New LLMs are released every week, and if you’re like me, you might ask yourself: Does this one finally fit all the use cases I want to utilise an LLM for? In this tutorial, I will share the techniques that I use to evaluate new LLMs. I’ll introduce three techniques I use regularly — none of them are new (in fact, I will refer to blog posts that I have written previously), but by bringing them all together, I save a significant amount of time whenever a new LLM is released. I will demonstrate examples of testing on the new OpenChat model.
Why is this important?
When it comes to new LLMs, it’s important to understand their capabilities and limitations. Unfortunately, figuring out how to deploy the model and then systematically testing it can be a bit of a drag. This process is often manual and can consume a lot of time. However, with a standardised approach, we can iterate much faster and quickly determine whether a model is worth investing more time in, or if we should discard it. So, let’s get started.