🎲 Little Language Models: Empowering Children to be Future AI Modelers

Published in

MIT MEDIA LAB

12 min readAug 7, 2024

— by MASH (Manuj Dhariwal + Shruti Dhariwal) | PhD Candidates at MIT | Co-creators of CoCo | Winners, 2024 Global EdTech Tools Prize

We want to start by stating that…

Future will NOT be shaped by AI; it will be shaped by the modelers of AI.

If children are not empowered to develop creative fluency with the foundational ideas underlying AI, we risk raising a generation that—instead of modeling AI systems in the future—might end up being modeled by them.

We believe it’s essential for young people today to not just learn how to ‘use’ the new magical AI tools. They should also have access to tools that help make AI un-magical for them—by letting them explore and engage early on with the powerful ideas that underlie these powerful technologies.

💡 Introducing “Probabilistic Thinking” as a new foundational fluency for children in the era of AI

In any era, the most important ideas for children to explore are often the ones that have the quality of being both timeless and timely. Such ideas are not narrowly restricted to a particular concept or skill. Instead, they are foundational to children’s understanding of the world. They help them develop new ways of thinking about their own thinking and learning, widely expanding the space of possibilities they can imagine.

In the digital era, Computational Thinking (CT) emerged to be one such idea and is now recognized as an essential literacy for learners globally. As we enter the AI era, it is both a critical and opportune time to reassess what types of ideas and fluencies will be most important for young people growing up today. Through this work, we aim to bring attention to a timeless foundational idea that has become especially timely now — the idea of Probabilistic Thinking.

CT + PT as foundational ideas for children in the era of AI

Probabilistic Thinking (PT) can be described as a decision-making approach that involves reasoning and making predictions under uncertainty. It provides a powerful framework to understand intelligence and learning processes, both artificial and human.

Just as CT underpins computer programming, the act of writing structured instructions for computers; PT is fundamental to probabilistic modeling — the process of building mathematical models that can quantify uncertainties and make predictions by learning from data.

The ideas of Probabilistic Thinking, Modeling, and Learning lie at the core of how Generative AI and Large Language Models (LLMs) like ChatGPT work.

Gif generated from this video by 3Blue1Brown explaining GPT

For instance, when you input a text prompt into ChatGPT, the model is able to generate a response by using probabilities (derived from processing training data) to predict the most likely next word or phrase. This ability makes LLMs incredibly useful and powerful across a wide variety of tasks, including for generating code. And so while the core ideas of computational thinking will continue to hold value, it is evident that the future builders and computer science researchers, will not just be writing programs but increasingly engage in tuning, tinkering, and building models.

How can we empower children to bring this (now ubiquitous) word ‘model’ into their own worlds and imaginations so it becomes a part of their thinking and vocabulary? How can we support educators to make the foundational ideas of probabilistic thinking, modeling, and learning accessible and engaging for young people?

Meet ‘Little Language Models’ — 🎲 🔤 🎨 🎸

‘Little Language Models’ (Little Models for short) is a co-creative mathematical microworld for children (ages 8–16) within our CoCo platform (coco.build). It is especially designed to be developmentally appropriate for young learners, helping them explore sophisticated probabilistic ideas and concepts underlying AI technologies — in ways that are co-creative, playful, and personally meaningful for them.

The tool offers custom probabilistic blocks and manipulatives that let children build their own little models using their own little data. The data can be in any form, including text, images, sounds, etc. Children can dynamically tinker with their models and use them to collaboratively explore new kinds of creative and generative projects with their peers. For instance, they can make models for generative art, stories, or music; build adaptive multiplayer games, or interactive visualizations based on real-time data from their peers, or even advanced projects using Markov blocks such as teaching a computer to learn how to draw, among an endless variety of other possibilities.

In the spirit of exploration before explanation — here’s a short 1-min video that gives a glimpse into the tool. Make sure to turn the sound ON to listen to our special little models song 🎙️:)…

A little glimpse into the Little Models microworld

🎲 DICE: A playful, mathematically-rich object to make the abstract idea of a “model” concrete for children

Ideas become powerful only when they become personal.

In conceiving and designing Little Models, one of our key insights early on was to ground the abstract idea of a ‘model’ in a mathematically-rich object that is familiar, concrete, and playful for children: the dice. But unlike standard dice that have fixed numbers from 1 to 6 on the sides, the dice in the Little Models microworld are designed to be infinitely customizable. The metaphor of physical dice makes it easier for young people to start imagining the little models (or digital dice) they’d like to build in the environment. We’ve often been delighted by the diversity of ideas children come up with when they discover that they can use the tool to make a model for anything they like — names, musical notes, or even ice cream flavors!

Children can create their own digital customizable dice in the environment.

🔍 Building their own “Little” Language Models can help demystify “Large” Language Models for children

In our workshops with young people, we typically follow a two-part structure. The first part involves hands-on explorations with the tool. During this time, children spend time imagining, building, and tinkering with their models to co-create a wide variety of generative projects with their peers in the real-time computational environment of CoCo. In the second part, we engage in a reflective discussion with the group about the key ideas and concepts they explored in the process.

Below, we highlight some of the conceptual, mathematical, algorithmic, and ethical lenses for understanding Generative AI models that we have seen children discover and develop through their creative explorations within the Little Models microworld:

1) 📶 Behind the magic, there’s a “Probability Distribution”

“I now know more about how a model works.” — 13-year-old

Remix of an image from an NVIDIA developer blog about LLMs

The most salient shift we see in students before and after our workshops is in their understanding of the powerful idea of probability. Existing approaches to introducing probability in classrooms often reduce it to mere calculation of fractions pertaining to narrow and impersonal events such as coin tosses, lotteries, or spinners. Even when doing workshops with high schoolers, we find that most of them often think of probability only in terms of specific formulaic descriptions. It’s not often intuitive to students how it is one of the key ideas underlying Generative AI systems.

Children can dynamically update the probabilities in their models.

In the Little Models microworld, children are not only able to create their own dice (or models), but also dynamically update the underlying probability distribution and visualize the effect in real-time in the kind of samples their model generates. They start to see probability as a playful clay like material that they can easily tinker with to explore a wide variety of creative possibilities. In the process, they are able to generalize the underlying mathematical ideas and see how they can be applied across different contexts.

2) 🎨 It’s not just text, AI can be “Multimodal”

“I used to think of AI as solely chatGPT, but now I see it can have many more capabilities.” — 14-year-old

Children can make little models using multiple types of data modalities

The Little Models tool is not limited to supporting only text data. It is designed to support multiple modalities including text, images, sounds, etc. Students are thus able to build generative models using music, text, colors, shapes, images, and also micro:bit sensor values as data. This helps expand their understanding of what constitutes a “language” beyond natural (text-based) languages. They start to see how their expressions made out of musical notes, drawing strokes, or even structures made from LEGO blocks can all be viewed as “languages,” each with its own elements and rules for composition. By recognizing these parallels, students are able to better grasp how modern Generative AI systems are able to produce not just coherent text, but also generate images, compose music, or create 3D designs, all by learning the patterns of the underlying “language.”

3) 📊 A little bit of “Training” goes a long way

“I had always wondered how things like chatgpt worked.” — 13-year-old

The Little Models microworld is intentionally designed to support learners across ages by providing a scaffolded experience for beginners to get started easily while also enabling sophisticated possibilities for more advanced learners. After creating and playing with simple weighted models, students can progress to building more complex models like Markov models. The intuitive interface allows learners to immediately see how the sequence of elements in their input data “trains” their Markov model and updates its underlying probabilities. This helps them draw parallels with key concepts in Generative AI as mentioned below:

1. Concept of ‘Sequence’: Children are able to make connections with how Generative AI systems also process and generate sequences of tokens one at a time, forming cohesive outputs such as essays, images, or music. For instance, using the tool, students can enter a sequence of notes and train their model to generate new musical sequences, or enter a sequence of colors to generate new patterns based on that.

Generating patterns based on input sequence of colors

2. Concept of ‘Context Length’: A defining feature of systems like ChatGPT is their ability to maintain context during a conversation, providing answers by considering a large amount of information (words / tokens) from previous interactions. In contrast, the simple Markov models children create typically have a context length of 1, meaning the model generates the next token based only on the last one. Students are able to explore how even a context length of 1 significantly improves their model’s output compared to a simple weighted model with a context length of 0. The same can be seen in this generative drawing model made by a student by entering a sequence of strokes as training data.

Training a Markov model based on input strokes

4) 🟨 “Bias” in output stems from input

“I have a better sense of what decisions are being made behind the scenes and how that can be mimicked on a smaller scale.” — 15-year-old

One of the critical ideas we’ve seen students getting to experience for themselves is around how bias can creep into AI models. The tool makes it concrete and visible to children that the model will draw out samples according to the distribution that they have set. That is, if they have more of yellow in the input data, the chance of rolling a yellow will be high. This provides a rich and concrete context for educators to have conversations with children about issues of bias and fairness in AI models.

Through children’s feedback in the pilot workshops and our interactions with them, we’ve seen first-hand how building their own little models with the tool can help them develop interests, intuitions, and insights about how Large Language Models like ChatGPT and others operate. The abstract ideas and terminologies that are used in explaining the workings of AI systems often remain too obscure and inaccessible for young learners (and even adults!). We hope the Little Models microworld can serve as a rich playground for educators to support children in developing a language and lens for these big ideas — helping them see AI as manipulable, not magical.

💭 Envisioning a new future for AI education: Towards “Intelligetics”

In this final section, we’d like to zoom out and scribble some of our thoughts on what the future of AI education can look like. We imagine that we will soon see an emergence of a new, inclusive, and interdisciplinary field in academia, which will then also be introduced as a core subject in K-12 education, much like math or social science. We propose calling it ‘Intelligetics’ (🗣️ Intelli-ge-tics).

Intelligetics will focus on the comprehensive study of intelligence — integrating human intelligence studies with the development and analysis of artificial intelligence systems. It will synthesize powerful ideas from a diverse array of existing disciplines, including: computer science, cognitive science, neuroscience, mathematics, artificial intelligence, philosophy, linguistics, social science, and learning sciences. It will attract people from varied backgrounds with diverse interests. Practitioners in the field — let’s call them ‘Intelligicians’ (🗣️ Intelli-gi-cians) — will bridge multiple areas of expertise, developing holistic approaches for advancing our understanding and capabilities in both natural and artificial intelligence, as well as their interplays.

Teaching children to be “Intelligicians”

How will we nurture the next generation to be Intelligicians?

In 1971, Seymour Papert published a memo at the A.I. Laboratory at MIT titled — Teaching children to be mathematicians vs. Teaching children about mathematics. He argued that “Children will become more proficient in mathematics if they do mathematics rather than merely learn about mathematics.” We deeply resonate with these words and the underlying constructionist paradigms of teaching and learning they represent.

We see our ideas and work on Little Language Models as a little but important first step towards offering children a rich, co-creative microworld where they can actively engage in doing AI (or Intelligetics) with their peers, rather than merely learning about AI passively.

As we look ahead, we are excited to continue imagining and building new microworlds that help empower children everywhere to see themselves as future AI builders, scientists, and Intelligicians — who have the skills and will to shape the future in ways that best serve humanity.

🏆 Winners, 2024 Global EdTech Tools Prize

We are delighted to share that our work on Little Language Models in CoCo is one of the winners in the 2024 Tools Competition that was supported by funders like Gates Foundation, OpenAI, Ballmer Group, Jacobs Foundation, and several others. It is the largest edtech competition in the world and around 2000 teams from 92 countries participated this year.

❇️ Interested in trying “Little Language Models”?

Visit coco.build to join the invite list.

Educators from 85+ countries have expressed excitement towards using CoCo and it has been wonderful to receive incredibly positive feedback and interest from schools and organizations globally. If you are reading this and have already signed up for CoCo earlier, we’ll be in touch soon.

If you have resonated with our ideas and co-creative approach towards AI education, and would like to say hello or help support this work in any way, please send us a note on hello@coco.build. We’d love to connect!

🥣 About us — MASH*

Hi, we are MASH: Manuj Dhariwal + Shruti Dhariwal. We’ve been married and working together for over a decade designing creative, collaborative, and playful learning technologies and experiences for children. Currently, we are both PhD candidates at MIT Media Lab in the Lifelong Kindergarten group. We have been recipients of the LEGO Papert Fellowship for our research and our joint doctoral work on CoCo (coco.build) has been featured on MIT News and EdSurge, among other publications.

Previously at MIT, Manuj completed his master’s in EECS majoring in AI/ML, while Shruti completed her master’s at the Media Lab working on creative computing tools for children. Before coming to MIT, we co-designed educational and social games as part of a company (founded by Manuj) that won the Top 10 Indian Innovators award by the Science and Technology Board, Govt. of India. These games were used in more than 3000 schools and reached over 250,000 families and 500,000 digital users.

We are deeply grateful to the MIT and Media Lab ecosystem; to our PhD advisor, Prof. Mitch Resnick, for supporting our work and efforts on CoCo; and to the Tools Competition for the award and recognition. Looking ahead, we are excited to continue working towards making CoCo available for communities globally and empowering educators everywhere to engage young people in Being. Creative. Together. — always, but especially in the era of AI.

Learn more about the core ideas and values underlying the design of CoCo as a calm, co-creative, and ‘self-less’ social platform for young people in this earlier post. Follow CoCo on LinkedIn, X (Twitter), and YouTube for more updates.