PODCAST

Out-of-distribution generalization

Irina Rish on one of the toughest problems in AI

Jeremie Harris
Towards Data Science
4 min readMar 9, 2022

--

APPLE | GOOGLE | SPOTIFY | OTHERS

Editor’s note: The TDS Podcast is hosted by Jeremie Harris, who is the co-founder of Mercurius, an AI safety startup. Every week, Jeremie chats with researchers and business leaders at the forefront of the field to unpack the most pressing questions around data science, machine learning, and AI.

During training, AIs will often learn to make predictions based on features that are easy to learn, but deceptive.

Imagine, for example, an AI that’s trained to identify cows in images. Ideally, we’d want it to learn to detect cows based on their shape and colour. But what if the cow pictures we put in the training dataset always show cows standing on grass?

In that case, we have a spurious correlation between grass and cows, and if we’re not careful, our AI might learn to become a grass detector rather than a cow detector. Even worse, we could only realize that’s happened once we’ve deployed it in the real world and it runs into a cow that isn’t standing on grass for the first time.

So how do you build AI systems that can learn robust, general concepts that remain valid outside the context of their training data?

That’s the problem of out-of-distribution generalization, and it’s a central part of the research agenda of Irina Rish, a core member of the Mila— Quebec AI Research institute, and the Canadian Excellence Research Chair in Autonomous AI. Irina’s research explores many different strategies that aim to overcome the out-of-distribution problem, from empirical AI scaling efforts to more theoretical work, and she joined me to talk about just that on this episode of the podcast.

Here were some of my favourite take-homes from the conversation:

  • To Irina, GPT-3 was an “AlexNet moment” for AI alignment and AI safety research. For the first time, we had built highly capable AIs without actually understanding their behaviour, or knowing how to steer it. As a result, Irina thinks that this is a great time to get into AI alignment research.
  • Irina thinks that out-of-distribution generalization is an area where AI capabilities research starts to merge with AI alignment and AI safety research. Getting systems to learn robust concepts is not only helpful for ensuring that they have rich representations of the world (which helps with capabilities), but also helps ensure that accidents don’t happen by tackling the problem of spurious correlations.
  • Irina has researched several strategies aimed at addressing the out-of-distribution sampling problem. One of them involves using the invariance principle: the idea that the features we want our AI models to learn from are going to be consistent (invariant) regardless of the environments our data come from. Consider for example the case of cow detection I mentioned earlier: the features we want our AI to lock onto (the shape and colour of cows, for example) are consistent across different environments. A cow is still a cow whether it’s in a pasture, indoors or in the middle of a desert. Irina is exploring techniques that allow AIs to distinguish between features that are invariant and desirable (like cow shape and colour) and features that are variable and unreliable predictors (like whether or not there’s grass on the ground).
  • Another approach Irina sees as promising is scaling. We’ve talked about scaling on the podcast before — but in a nutshell, it’s the idea that current deep learning techniques can more or less get us to AGI as-is, if only they’re used to train large enough neural networks with huge enough datasets, and an equally massive quantity of compute power. In principle, scaling certain kinds of neural nets could allow AIs to learn so much about their training data that their performance is limited only by the irreducible noise of the data itself.
  • That possibility raises another question: is there too much noise in language data (which was used to train GPT-3, and the first generation of massively scaled foundation models) for AIs trained on language alone to reach human-level capabilities across the board? It’s possible, Irina thinks — and that’s why she’s excited about the trend towards multi-modal learning: the practice of training AIs on multiple data types at the same time (for example, on image, text and audio data). The hope is that by combining these input data types, an AI can learn to transcend noise limits that may exist in any one data type alone.

You can follow Irina on Twitter here, or me here.

Chapters:

  • 0:00 Intro
  • 2:00 Research, safety, and generalization
  • 8:20 Invariant risk minimization
  • 15:00 Importance of scaling
  • 21:35 Role of language
  • 27:40 AGI and scaling
  • 32:30 GPT versus ResNet 50
  • 37:00 Potential revolutions in architecture
  • 42:30 Inductive bias aspect
  • 46:00 New risks
  • 49:30 Wrap-up

--

--

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Jeremie Harris
Jeremie Harris

Written by Jeremie Harris

Co-founder of Gladstone AI 🤖 an AI safety company. Author of Quantum Mechanics Made Me Do It (preorder: shorturl.at/jtMN0).

Responses (1)