Felix HofstätterinTowards Data ScienceHow Assistance Games make AI saferAnd how they don’t12 min read·Oct 26, 2022--1--1
Felix HofstätterinTowards Data ScienceSpotting Unfair or Unsafe AI using Graphical CriteriaHow to use causal influence diagrams to recognize the hidden incentives that shape an AI agent’s behavior12 min read·Jun 24, 2022----
Felix HofstätterinTowards Data ScienceHow to stop your AI agents from hacking their reward functionUsing causal influence diagrams and current-RF optimisation13 min read·Apr 4, 2022--1--1
Felix HofstätterinTowards Data ScienceCounterfactuals for Reinforcement Learning II: Improving Reward LearningSafer reward function learning using counterfactuals12 min read·Jan 15, 2022----
Felix HofstätterinTowards Data ScienceCounterfactuals for Reinforcement Learning I: “What if… ?”Introduction to the POMDP framework and counterfactuals8 min read·Dec 30, 2021--1--1
Felix HofstätterinTowards Data ScienceHow learning reward functions can go wrongAn AI-safety minded perspective on the risks of Reinforcement Learning agents learning their reward functions12 min read·Nov 16, 2021--2--2
Felix HofstätterinTowards Data ScienceAdapting Soft Actor Critic for Discrete Action SpacesHow to apply the popular algorithm to new problems by changing only two equations13 min read·Nov 16, 2021--2--2