Meet General Purpose Learners, the Next Evolution of AI

Derek Hoiem
Vision of Seeing
Published in
4 min readJul 15, 2022
Major Evolutions of AI. Tremendous progress has been achieved by learning parameters of feature and predictor models from data, rather than relying on the ingenuity of the developer. The next step is AI systems that do not rely on the developer’s task definitions but can flexibly adapt and learn a wide range of tasks.

The tremendous success so far in AI stems from taking the details of the features and predictor out of the hands of the AI developer and instead learning them from data. The resulting models are very effective, but only on the specific predefined problems for which they were designed and trained. In consequence, AI is currently mainly applicable to highly valuable, formalizable, and stable problems that can justify years and millions of dollars of development.

The next evolution of AI is to take the task specification out of the hands of the developer and create general purpose learners that can adapt and learn tasks unknown at time of design.

What is a General Purpose Learner

As described in General Purpose Vision by Gupta et al., a general purpose learner should achieve three forms of generality:

  • Generality of architecture: Can learn and perform any task within input/output modalities without change to parameter structure
  • Generality of concepts across skills: Can separately learn concepts (e.g. “hammer”) and skills (e.g. detect, answer questions) and perform combinations of them
  • Generality of learning: Can learn new tasks sample-efficiently with minimal loss to performance on previously learned tasks

GPT-3, for example, is a general purpose learner that maps from text input to text output. The input defines the task and text to be processed, and GPT-3 generates text in response. E.g., “Translate English to Spanish: Hello” may be processed into “Hola”. Trained on billions of documents, GPT-3 can perform a wide variety of text-to-text tasks (translation, summarization, parts of speech tagging, etc.) and learn to perform new ones by prompting with examples. More than 300 applications have been created using GPT-3, including many not envisioned by the GPT-3 developers.

General purpose computing made it possible to create new computing applications without expertise in the computing hardware, starting a new Information Age. Similarly, general purpose learning enables creation of new AI applications without detailed knowledge of the AI model, perhaps starting a new age of truly intelligent machines.

How GPLs will Evolve

General purpose learners (GPLs)are in their infancy and will require thousands of researcher-years (20 calendar years?) to fully mature.

Multimodal GPL: GPLs becomes truly powerful with more input and output modalities. Text-to-text was a natural beginning because a single modality can cover task definition, input, and output. Humans have five input modalities and speech and motor control output modalities. Recent GPL systems GPV-1, GPV-2, and Unified-IO extend to vision-language. Unified-IO can solve dozens of tasks such as recognition, question answering, text translation, and image generation. Next is to output controls to operate software interfaces or robotic manipulators, which Gato works towards. GPT-3, Unified-IO, and Gato are all sequence-to-sequence processors, reminiscent of the Turing machine.

Unified IO maps from text+image input to text+image output, with capabilities for a wide range of geometric, semantic, and generative image-based tasks, as well as natural language tasks.

Context-Rich GPL: The vast majority of AI models are context-free, processing sequences of image or text with no sense of environment or history. To perform more complex tasks and build relationships, GPLs will need episodic memory and the ability to accumulate knowledge of people and the environment.

Foundations for Learning: A person can learn a new task from few examples and trials because she builds on broad base of sensory, motor control, social, and conceptual learning acquired from infancy onward. Foundation models started with the realization, roughly ten years ago, that models trained on ImageNet classification produced useful features for many other classification tasks and could be tuned with limited data for others. More recently MAE and SimMIM, inspired by success in masked language modeling, show effective visual learning by completing masked images. Models like CLIP, Flamingo, and Florence train from enormous web corpora and demonstrate impressive zero-shot abilities for classification and other text-image mapping tasks. Effective input models may provide a good foundation for simple perceptual tasks (perhaps akin to a newborn soaking in the world but not yet able to control her hands), but more complex GPLs will require more advanced curricula for learning.

Continual Learning: To continue to evolve and learn new tasks, we develop effective strategies to learn from targeted training exercises while retaining prior abilities and generalization remains. It is not clear whether there is some latent breakthrough idea for continual learning— being smart about which parameters are updated and mixing in past training episodes may suffice.

Self-Direction: Individual tasks can be specified with targets and losses, but most jobs that pay require performing collections of tasks with large degrees of self-direction. We need GPLs that can micromanage themselves to break down larger jobs into smaller tasks and learn and receive feedback in effective ways to do so.

Benchmarking GPLs

A good benchmark is the foundation of AI research. GLUE And SuperGLUE, which benchmark NLP systems on a range of tasks, were essential to the development of GPLs in language. The GRIT benchmark was recently created to evaluate several vision and language tasks across many concepts and data sources while measuring generalization, calibration, and robustness. A larger effort is required, potentially collaborating across vision, language, and robotics communities in both academia and industry to further encourage and evaluate GPLs.

Take-aways

  1. Current algorithms effectively solve tasks by learning model parameters from data, but are limited by the need to define tasks in advance.
  2. General purpose learners (GPLs) are learned models intended to adapt to yet-unspecified tasks, greatly expanding potential applications.
  3. Existing GPLs are already impressive, but their true power will come as through multimodal IO, long-term memory, improved curriculums, and capacity for planning.
  4. We need new benchmarks to broadly evaluate GPLs.

--

--

Derek Hoiem
Vision of Seeing

Professor at University of Illinois at Urbana-Champaign. Chief Science Officer of Reconstruct.