Felix Hofstätter – Medium

Felix Hofstätter

Felix Hofstätter
in
Towards Data Science

How Assistance Games make AI safer

And how they don’t

12 min readOct 26, 2022

--

1

How Assistance Games make AI safer

--

1

Felix Hofstätter
in
Towards Data Science

Spotting Unfair or Unsafe AI using Graphical Criteria

How to use causal influence diagrams to recognize the hidden incentives that shape an AI agent’s behavior

12 min readJun 24, 2022

--

Spotting Unfair or Unsafe AI using Graphical Criteria

--

Felix Hofstätter
in
Towards Data Science

How to stop your AI agents from hacking their reward function

Using causal influence diagrams and current-RF optimisation

13 min readApr 4, 2022

--

1

How to stop your AI agents from hacking their reward function

--

1

Felix Hofstätter
in
Towards Data Science

Counterfactuals for Reinforcement Learning II: Improving Reward Learning

Safer reward function learning using counterfactuals

12 min readJan 15, 2022

--

Counterfactuals for Reinforcement Learning II: Improving Reward Learning

--

Felix Hofstätter
in
Towards Data Science

Counterfactuals for Reinforcement Learning I: “What if… ?”

Introduction to the POMDP framework and counterfactuals

8 min readDec 30, 2021

--

1

Counterfactuals for Reinforcement Learning I: “What if… ?”

--

1

Felix Hofstätter
in
Towards Data Science

How learning reward functions can go wrong

An AI-safety minded perspective on the risks of Reinforcement Learning agents learning their reward functions

12 min readNov 16, 2021

--

2

How learning reward functions can go wrong

--

2

Felix Hofstätter
in
Towards Data Science

Adapting Soft Actor Critic for Discrete Action Spaces

How to apply the popular algorithm to new problems by changing only two equations

13 min readNov 16, 2021

--

2

Adapting Soft Actor Critic for Discrete Action Spaces

--

2

Felix Hofstätter

Felix Hofstätter

Software Consultant at TNG Technology Consulting. Passionate about Reinforcement Learning, AI Alignment and Effective Altruism

Following

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams