Felix HofstätterinTowards Data ScienceHow Assistance Games make AI saferAnd how they don’tOct 26, 20221Oct 26, 20221
Felix HofstätterinTowards Data ScienceSpotting Unfair or Unsafe AI using Graphical CriteriaHow to use causal influence diagrams to recognize the hidden incentives that shape an AI agent’s behaviorJun 24, 2022Jun 24, 2022
Felix HofstätterinTowards Data ScienceHow to stop your AI agents from hacking their reward functionUsing causal influence diagrams and current-RF optimisationApr 4, 20221Apr 4, 20221
Felix HofstätterinTowards Data ScienceCounterfactuals for Reinforcement Learning II: Improving Reward LearningSafer reward function learning using counterfactualsJan 15, 2022Jan 15, 2022
Felix HofstätterinTowards Data ScienceCounterfactuals for Reinforcement Learning I: “What if… ?”Introduction to the POMDP framework and counterfactualsDec 30, 20211Dec 30, 20211
Felix HofstätterinTowards Data ScienceHow learning reward functions can go wrongAn AI-safety minded perspective on the risks of Reinforcement Learning agents learning their reward functionsNov 16, 20212Nov 16, 20212
Felix HofstätterinTowards Data ScienceAdapting Soft Actor Critic for Discrete Action SpacesHow to apply the popular algorithm to new problems by changing only two equationsNov 16, 20213Nov 16, 20213