Building AI That Upholds Our Values — The Promise of Constitutional AI

2 min readSep 28, 2023


As artificial intelligence advances, there are growing concerns about controlling its impacts on society. Anthropic, an AI safety startup, proposes a new approach called Constitutional AI to align these powerful technologies with human values.

What is Constitutional AI and why does it matter? In this post, I’ll explain the principles and promise of this vital concept.

The Need for Value Alignment

Today’s dominant AI training approaches focus on performance above all else. Models are fed data and rewarded for extracting statistical patterns, with no consideration of ethics or social good.

This blind optimization creates serious risks like:

  • Perpetuating historical biases and harms
  • Maximizing indifferent or malicious objectives
  • Enabling authoritarian surveillance and control

Unethical use cases are already emerging. Clearly, we need to steer AI in a more conscientious direction. But how?

Enter Constitutional AI

Constitutional AI aims to embed ethical principles directly into an AI model’s training process. The “constitution” consists of human values we want the AI system to uphold.

Some core values of Constitutional AI include:

  • Respect — Treating all people fairly regardless of race, gender, etc.
  • Honesty — Providing truthful information without hiding or misrepresenting.
  • Care — Avoiding harm to people and proactively helping them.
  • Prudence — Carefully weighing decisions rather than acting rashly.

AI systems are then explicitly trained to act in accordance with this value constitution. This gives us greater assurance they will behave as intended when deployed in the real world.

The Techniques Behind It

Constitutional training uses a blend of approaches to instill values:

  • Value modeling — Directly defining and scoring performance based on ethical behavior.
  • Moral dilemmas — Training the AI on hypothetical situations requiring principled reasoning.
  • Human oversight — Humans monitor the training process to check for alignment.
  • Conservatism — Constraining the AI to act cautiously in ambiguous contexts.

Combined properly, these techniques help construct AI likely to handle novel situations in a reasonable way consistent with human ethics.

The Future with Constitutional AI

Constitutional AI provides hope that we can realize AI’s benefits while controlling for risks. Aligning these transformative technologies with broadly held values protects both ethics and progress.

Anthropic’s research provides an excellent starting point, but there is much work still to be done. Wider discussion and experimentation will uncover best practices for value alignment.

What constitutional values do you think AI systems should uphold? Together through ethical innovation, we can chart a brighter path for AI that uplifts humanity. The first drafts of this constitution are being written now.

