Knowledge Distillation and the Jeopardy Phenomena

Sam Bobo
5 min readApr 24, 2024

--

Generated using Image Creator from Designer with Prompt: “A learner absorbing complex knowledge through childrens books”

Answer: He holds the longest consecutive Jeopardy! wins with a record-breaking 32 wins, earning him more than $2.4 million dollars?

Question: Who is James Holzhauer?

On April 4, 2019, James Holzhaur took his place on the Jeopardy! game show floor as a new contestant. His background — a professional sports gambler. He started taking the media by storm by his quick-timed questions (“answers”) to the show’s prompts and aggressive tactic of maximizing the earnings on each daily double his, thereby outpacing his competitors, a classic poker strategy. Throughout his appearances on Jeopardy!, he landed many interviews from awe-struck media hosts trying to glean more into his ongoing successes. During interviews, his shared that one of the secrets to obtaining knowledge was reading through children's books he checked out of the Las Vegas public library.

From Perspectives on Reading (“PoR”):

PoR: Part of your Jeopardy! strategy included reading children’s books, especially when it came to a subject you weren’t interested in and couldn’t get into with the adult reference titles. For certain subjects, there were probably multiple children’s books available — how did you decide which ones to read? Or did you read many of them to try and get a complete view?

James Holzhauer: Often it was just whichever book caught my eye first, although I kept coming back to certain series like Classics Illustrated. I did usually read more than one book per subject to ensure I wasn’t missing anything important.

Holzhauer found that children’s books were designed to engage the reader and were filled with infographics, pictures, and other elements that kept him interested. He used these books to quickly learn about a wide range of subjects. When it came to a subject he wasn’t interested in and couldn’t get into with adult reference titles, he turned to children’s books. He often read more than one book per subject to ensure he wasn’t missing anything important. This strategy helped him fill potential gaps in his knowledge base. He believed that children’s books made certain nonfiction subjects more appealing.

So why bring up Holzhauer in a blog about AI? Certainly there is an AI angle here? Yes!

Microsoft Phi-3 and Children’s Books

On April 23, 2024 Microsoft announced the latest version of its Small Language Model (“SLM”), Phi-3. The model boasts a 3.8 billion parameters and trained on a data set smaller than GPT-4. Phi also comes in two other varieties — Phi-3 Medium at 14B parameters and Phi-3 Small at 7B parameters. In an interview with The Verge, Eric Boud, corporate VP of Microsoft Azure AI Platform stated that

[…] developers trained Phi-3 with a “curriculum.” They were inspired by how children learned from bedtime stories, books with simpler words, and sentence structures that talk about larger topics.

“There aren’t enough children’s books out there, so we took a list of more than 3,000 words and asked an LLM to make ‘children’s books’ to teach Phi,” Boyd says.

Maybe I am mistaken but that sounds too similar to the approach that James Holzhauer employed to obtain a vast foundational (pun intended) knowledge set!

This practice by Microsoft and similarly by children’s books parallels a critical paradigm in training AI models called Transfer Learning. As humans, we employ transfer learning consistently. For example (not that I am a rider but it articulates the example very clearly), many of us have learned how to ride a bicycle at some point in our lives. For those interested in learning how to ride a motorcycle, our cognitive wiring would allow us to transfer the skills we obtained from riding a bicycle — balance, coordination, perception, safety, etc — and accelerate our learning of riding a motorcycle.

In Machine Learning, this concept of transfer learning is taken a step further that parallels our school system as well as within children's books — Distillation. Take the classic example of professors at universities who are experts in their fields focusing on a specific niche for ongoing research. Professors, in addition to research, teach university courses to college students to build their knowledge of a particular subject akin to their work (or broadly for a 100-level introductory class). An expert, the professor, transfers learning to the students via distillation within the course curriculum as they know the information the best and are skilled in their craft as a professor to design a curriculum with the goal of transferring that learning in the most efficient manner possible.

Shifting to children’s books, I am fascinated by the Nerdy Babies series authored by Emily Kastner. The series, which I personally own and read frequently to my children, inspire intellectual curiosity while exposing children to complex topics and vocabulary within particular fields including rocks, dinosaurs, weather, oceanography, space, and more. Her tagline in every book: “Stay curious. There’s more to learn about everything” is incredible in inspiring kids to ask questions and spark curiosity to learn more. Per Holzhauer, the engaging pictures and infographics help learners absorb information quickly in an engaging and simplistic manner, that is distillation and transfer learning at its finest.

The fact that Microsoft was creating a pseudo generative adversarial network (GAN) model (I say pseudo as its the iterative and circular paradigm I am referencing) of creating children’s books to then teach other large (or small) language models is brilliant, specifically for small language models with the core focus of operating on less-powerful compute environments like smart phones.

This aspect of distillation and transfer learning follows, if not adds, to the Counsil of Experts Leader-Agent Model, specifically for training agents to follow a leader in solving a complex task.

Should we all be able to solve the data integrity problem, harness the unique competitive advantage of the data we hold (following privacy best practices), than we can truly achieve more than simple efficiencies gained via Artificial Intelligence!

If you have experienced other modalities of learning similar to distillation from children's books, please let me know!

Follow me on LinkedIn and continue to support this blog! Thank you!

--

--

Sam Bobo

Product Manager of Artificial Intelligence, Conversational AI, and Enterprise Transformation | Former IBM Watson | https://www.linkedin.com/in/sambobo/