Homepage
Sign in / Sign up
AI Control
Speculations on the design of safe, efficient AI systems.
Follow
Approval-directed agents
An AI doesn’t need an explicit goal to exhibit intelligent behavior.
Paul Christiano
Dec 1, 2014
Semi-supervised reinforcement learning
A problem at the intersection of AI control and traditional RL research.
Paul Christiano
May 6
Red teams
Training AI systems to avoid catastrophic errors — without causing catastrophes.
Paul Christiano
May 28
Latest
Reliability amplification
Can redundancy increase the reliability of complex policies in the same way it can increase the reliability of computation?
Paul Christiano
17 hrs ago
ALBA on GitHub
A preliminary ALBA implementation is now on GitHub: https://github.com/paulfchristiano/alba
Paul Christiano
2 days ago
Not just learning
I’ve been focusing on aligned learning, but AI is more than just learning.
Paul Christiano
5 days ago
Imitation+RL
Imitation+RL might be a more natural model for powerful AI than either imitation or RL.
Paul Christiano
Oct 15
Security and AI control
AI control and AI security are probably more closely connected than I used to think.
Paul Christiano
Oct 14
Ignoring computational limits with reflective oracles
Reflective oracles provide a natural computational model where there is no such thing as “not enough time to find the answer.”
Paul Christiano
Oct 4
Extracting information
Can we incentivize experts to optimally gather relevant information? A clean open question relevant to AI control.
Paul Christiano
Oct 3
Capability amplification
Can we use a weak policy with a fast implementation to construct a stronger policy with a slow implementation?
Paul Christiano
Oct 2
The reward engineering problem
How can we define rewards which incentivize weak RL agents to behave in a desirable way?
Paul Christiano
May 30
Learning with catastrophes
A catastrophe is an event so bad that we are not willing to let it happen even a single time
Paul Christiano
May 28
All stories
About AI Control