Training AI Systems the West World Way
How how to use pedagogical examples in the training of AI models.
I recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:
(Core ML concepts + groundbreaking research papers and frameworks + AI news and trends) x 5 minutes, 3 times a week =…
West World is one of my favorite TV series of the last few years. The HBO drama combines a stellar group of actors in an engaging plot that touches upon some of the most controversial aspects of the future of artificial intelligence(AI). In almost every episode of the first season of West World, we find humans trying to understand the decisions made by the hosts(robots) on specific circumstances. Every time some human needs an explanation about the host behavior can simply query a system that will proceed the explain the reasoning behind the host decision. Simply saying “Analysis, explain X or Y” and the host will pleasantly proceed to detail the intricacies behind its behaviors or actions. If only things work like that in real artificial intelligence(AI) systems.
Explaining and interpreting knowledge is one of the hardest problems in modern deep learning systems. In supervised deep learning systems, the processes for training a model and the knowledge built in that model are almost uninterpretable. However interpretation of knowledge is a key element in the way humans learn. Let’s take a classic student-teacher setting in which the teacher is trying to convey a specific concept to the student using a series of examples. Based on the feedback from the student, the teacher will adapt his explanations and try to select the most appropriate examples to improve the knowledge of the student. That pedagogical process works brilliantly for humans but fails miserably for neural networks.
Some of the most interesting scenarios in deep learning systems require a seamless collaboration between humans and neural networks. However, in most scenarios, its incredibly difficult to establish that collaboration as both sides speak different protocols. A couple of years ago, OpenAI published one of the most relevant papers in this area, Under the title, Interpretable and Pedagogical Examples, OpenAI proposes a method that tries to address this challenge by proposing a more pedagogical way to tech deep learning systems.
Under the title “Interpretable and Pedagogical Examples” the OpenAI researchers formulate an intriguing thesis about what makes understanding the knowledge of deep learning systems so difficult. In their opinion, part of the challenge is that most deep learning architectures rely on teacher and student neural networks to be train jointly which prevents any feedback loop between the two. Instead of that model, the OpenAI team proposes a structure in which teacher and student networks can be train iteratively which can produce more interpretable teaching strategies.
Interpretable Machine Learning
The OpenAI interpretable teaching strategy can be seen as a game dynamic between two neural networks, a student and a teacher. The goal of the game is for the student to guess a particular concept based on examples of that concept and the goal of the teacher is to learn to select the most illustrative examples for the student. Using an image recognition scenario as an analogy, the student should try to guess the concepts in a specific image while the teacher should try to select the most appropriate images to improve the knowledge of the student.
The two-stage technique to interpretable teaching works like this: a ‘student’ neural network is given randomly selected input examples of concepts and is trained from those examples using traditional supervised learning methods to guess the correct concept labels. In the second step, the ‘teacher’ network — which has an intended concept to teach and access to labels linking concepts to examples — tests the different examples on the student and see which concept labels the student assigns them, eventually converging on the smallest set of examples it needs to give to let the student guess the intended concept.
The key to the OpenAI methods is that the teacher and student networks are being trained iteratively rather than jointly. In the traditional mode, both neural networks will be trained together selecting examples that are hard to interpret by humans. The goal of the OpenAI technique is to produce more interpretable teaching strategies but how do we really quantify interpretable? To evaluate the performance of the mode, the OpenAI team centered in two fundamental metrics:
1. Evaluating how similar the selected strategies are to intuitive human-designed strategies in each task.
2. Evaluating the effectiveness of the selected strategies at teaching humans.
The OpenAI researchers applied interpretable strategies across a large variety of scenarios producing remarkable results that vastly improve over traditional techniques. More specifically, interpretable teaching leads the student model to learn an interpretable learning strategy, which then constrains the teacher to learn an interpretable teaching strategy.