Why Humor Is the Perfect Benchmark for Generative AI

Are LLMs like ChatGPT and Gemini capable of being funny?

Published in

The Generator

9 min readDec 11, 2023

There are a lot of ways to test today’s powerful Large Language Models. You can give them benchmarks for the speed, measure their number of parameters, or see how they perform on a battery of tests — from image recognition tasks to bar exams.

But I’ve found there’s a quicker (and far more enjoyable) way to test new Large Language Models: see if they’re funny.

Nailing Intent

At first, humor might not seem like the best test for AI success.

Most people are more interested in using ChatGPT to write their blog posts or cajole their supervisor into a salary increase than in having the model tell them jokes. But humor is actually an excellent test for a Large Language Model.

The best LLMs succeed not just because they’re good at performing rote tasks, but because they’re good at understanding people and their intents. But figuring out intent is hard. People are complex, contradictory, and often suck at expressing themselves.

If you ask ChatGPT to “make my essay better,” for example, it has to understand what you mean by “better.” Do you want the model to proofread your writing, fixing your egregious…

Why Humor Is the Perfect Benchmark for Generative AI

Are LLMs like ChatGPT and Gemini capable of being funny?

Nailing Intent

Written by Thomas Smith