InTowards Data SciencebyTrupti BavalattiGen-AI Safety Landscape: A Guide to the Mitigation Stack for Text-to-Image ModelsNo Wild West for AI: A tour of the safety components that tame T2I modelsOct 268
SoniaBook Summary: ‘The Coming Wave’ by Mustafa SuleymanKey points on technology, power, and the 21st century’s greatest dilemma.Feb 4Feb 4
InGoPenAIbyMichael HumorWhat does the command “rm -rf /” do and what if an AI produces it?Key Takeaway:10h ago10h ago
InTowards Data SciencebyTarik DzekmanExploring the AI Alignment Problem with GridWorldsIt’s difficult to build capable AI agents without encountering orthogonal goalsOct 64Oct 64
InTowards Data SciencebyTrupti BavalattiGen-AI Safety Landscape: A Guide to the Mitigation Stack for Text-to-Image ModelsNo Wild West for AI: A tour of the safety components that tame T2I modelsOct 268
SoniaBook Summary: ‘The Coming Wave’ by Mustafa SuleymanKey points on technology, power, and the 21st century’s greatest dilemma.Feb 4
InGoPenAIbyMichael HumorWhat does the command “rm -rf /” do and what if an AI produces it?Key Takeaway:10h ago
InTowards Data SciencebyTarik DzekmanExploring the AI Alignment Problem with GridWorldsIt’s difficult to build capable AI agents without encountering orthogonal goalsOct 64
Jonathan DavisUnderstanding Anthropic’s Golden Gate ClaudeAnthropic’s research into monosemanticity can improve language model interpretability and safetyJun 271
Karthik RajaInsights from Benjamin Mann’s Lecture on AI Safety at AnthropicI recently had the opportunity to audit a fascinating lecture by Benjamin Mann from Anthropic titled AI Safety and Scaling Governance. In…13h ago
Ayyüce Kızrak, Ph.D.Mechanistic Interpretability in Action: Understanding Induction Heads and QK Circuits in…This project, created for the AI Alignment Course — AI Safety Fundamentals powered by BlueDot Impact, leverages a range of advanced…Sep 282