Felix Hofstätter – Medium

Felix Hofstätter

Felix Hofstätter
in
Towards Data Science

How Assistance Games make AI safer

And how they don’t

Oct 26, 2022

How Assistance Games make AI safer

Oct 26, 2022

Felix Hofstätter
in
Towards Data Science

Spotting Unfair or Unsafe AI using Graphical Criteria

How to use causal influence diagrams to recognize the hidden incentives that shape an AI agent’s behavior

Jun 24, 2022

Spotting Unfair or Unsafe AI using Graphical Criteria

Jun 24, 2022

Felix Hofstätter
in
Towards Data Science

How to stop your AI agents from hacking their reward function

Using causal influence diagrams and current-RF optimisation

Apr 4, 2022

How to stop your AI agents from hacking their reward function

Apr 4, 2022

Felix Hofstätter
in
Towards Data Science

Counterfactuals for Reinforcement Learning II: Improving Reward Learning

Safer reward function learning using counterfactuals

Jan 15, 2022

Counterfactuals for Reinforcement Learning II: Improving Reward Learning

Jan 15, 2022

Felix Hofstätter
in
Towards Data Science

Counterfactuals for Reinforcement Learning I: “What if… ?”

Introduction to the POMDP framework and counterfactuals

Dec 30, 2021

Counterfactuals for Reinforcement Learning I: “What if… ?”

Dec 30, 2021

Felix Hofstätter
in
Towards Data Science

How learning reward functions can go wrong

An AI-safety minded perspective on the risks of Reinforcement Learning agents learning their reward functions

Nov 16, 2021

How learning reward functions can go wrong

Nov 16, 2021

Felix Hofstätter
in
Towards Data Science

Adapting Soft Actor Critic for Discrete Action Spaces

How to apply the popular algorithm to new problems by changing only two equations

Nov 16, 2021

Adapting Soft Actor Critic for Discrete Action Spaces

Nov 16, 2021

Felix Hofstätter

Felix Hofstätter

Software Consultant at TNG Technology Consulting. Passionate about Reinforcement Learning, AI Alignment and Effective Altruism

Following

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams