Why Humor Is the Perfect Benchmark for Generative AI

Are LLMs like ChatGPT and Gemini capable of being funny?

Thomas Smith
The Generator

--

Illustration by the author via DALL-E

There are a lot of ways to test today’s powerful Large Language Models. You can give them benchmarks for the speed, measure their number of parameters, or see how they perform on a battery of tests — from image recognition tasks to bar exams.

But I’ve found there’s a quicker (and far more enjoyable) way to test new Large Language Models: see if they’re funny.

Nailing Intent

At first, humor might not seem like the best test for AI success.

Most people are more interested in using ChatGPT to write their blog posts or cajole their supervisor into a salary increase than in having the model tell them jokes. But humor is actually an excellent test for a Large Language Model.

The best LLMs succeed not just because they’re good at performing rote tasks, but because they’re good at understanding people and their intents. But figuring out intent is hard. People are complex, contradictory, and often suck at expressing themselves.

If you ask ChatGPT to “make my essay better,” for example, it has to understand what you mean by “better.” Do you want the model to proofread your writing, fixing your egregious…

--

--