Sign in / Sign up
Speculations on the design of safe, efficient AI systems.
Prosaic AI control
I argue that AI control should focus on the possibility that we build AGI without learning anything fundamentally new about intelligence.
Nov 18, 2016
Security and AI control
AI control and AI security are probably more closely connected than I used to think.
Oct 14, 2016
Directions and desiderata for AI control
I lay out three research directions in AI control, and three desiderata that I think should guide research in these areas.
Something is benign if it isn’t optimized to be bad. “Benign” is weaker than “aligned,” but I find it helpful for thinking about AI…
Nov 29, 2016
I think that discussions of AI control should aim to identify subproblems that we aren’t making progress on but are necessary.
Nov 26, 2016
AI “safety” vs “control” vs “alignment”
Defining what I mean by “AI safety,” “AI control,” and “value alignment.”
Nov 18, 2016
Handling destructive technology
Solving AI control is just “delaying the inevitable” with respect to the need for global coordination, but it seems high-impact anyway.
Nov 14, 2016
Thoughts on reward engineering
Addressing a bunch of details that come up when we try to convert our preferences into a reward function for RL.
Nov 8, 2016
Can we use agents with many vulnerabilities to implement an agent with fewer vulnerabilities?
Oct 26, 2016
Building agents out of agents.
Oct 25, 2016
Of humans and universality thresholds
I’ve suggested that HCH might be a universal deliberative process if run with humans but not if run with apes. Is that a suspicious?
Oct 23, 2016
Some thoughts on training highly reliable models
A grab bag of relevant considerations, mostly pointing out that the problem is even harder than it might at first appear.
Oct 22, 2016
Powerful searches are likely to pose a distinctive challenge for AI control.
Oct 21, 2016
About AI Control