Homepage
Sign in / Sign up

AI Control

Speculations on the design of safe, efficient AI systems.

Approval-directed agents

An AI doesn’t need an explicit goal to exhibit intelligent behavior.
Go to the profile of Paul Christiano
Paul Christiano
Dec 1, 2014

Semi-supervised reinforcement learning

A problem at the intersection of AI control and traditional RL research.
Go to the profile of Paul Christiano
Paul Christiano
May 6

Red teams

Training AI systems to avoid catastrophic errors — without causing catastrophes.
Go to the profile of Paul Christiano
Paul Christiano
May 28
Latest

Reliability amplification

Can redundancy increase the reliability of complex policies in the same way it can increase the reliability of computation?
Go to the profile of Paul Christiano
Paul Christiano
17 hrs ago

ALBA on GitHub

A preliminary ALBA implementation is now on GitHub: https://github.com/paulfchristiano/alba
Go to the profile of Paul Christiano
Paul Christiano
2 days ago

Not just learning

I’ve been focusing on aligned learning, but AI is more than just learning.
Go to the profile of Paul Christiano
Paul Christiano
5 days ago

Imitation+RL

Imitation+RL might be a more natural model for powerful AI than either imitation or RL.
Go to the profile of Paul Christiano
Paul Christiano
Oct 15

Security and AI control

AI control and AI security are probably more closely connected than I used to think.
Go to the profile of Paul Christiano
Paul Christiano
Oct 14

Ignoring computational limits with reflective oracles

Reflective oracles provide a natural computational model where there is no such thing as “not enough time to find the answer.”
Go to the profile of Paul Christiano
Paul Christiano
Oct 4

Extracting information

Can we incentivize experts to optimally gather relevant information? A clean open question relevant to AI control.
Go to the profile of Paul Christiano
Paul Christiano
Oct 3

Capability amplification

Can we use a weak policy with a fast implementation to construct a stronger policy with a slow implementation?
Go to the profile of Paul Christiano
Paul Christiano
Oct 2

The reward engineering problem

How can we define rewards which incentivize weak RL agents to behave in a desirable way?
Go to the profile of Paul Christiano
Paul Christiano
May 30

Learning with catastrophes

A catastrophe is an event so bad that we are not willing to let it happen even a single time
Go to the profile of Paul Christiano
Paul Christiano
May 28
All storiesAbout AI Control