A Human Alignment Problem

5 min readJun 12, 2023

Last week I thought about practical ethics for educators; this week I’m musing on helping students discover and then live in accordance with values. Next week I’ll conclude this inaugural trio of reflections, bringing both strands together by exploring concrete instantiations of these ideas.

In the field of artificial intelligence the “alignment problem” refers to the challenge of ensuring that AI systems are aligned with human values and pursue their “objectives” (reward functions) with fidelity to the spirit of our values. Many issues arise: the lack of a universal human value system and of exceptionless rules even within a given value system, the difficulties of rigorously stating moral norms, indeterminate tradeoff calculus, and the distance between human beliefs and behaviors. One of the more well-known attempts (second perhaps only to Asimov’s Three Rules of Robotics) to succinctly sketch design principles that might encourage alignment comes from Stuart Russell’s Human Compatible: Artificial Intelligence the Problem of Control (summary here). He suggests that guidelines might go something like this:

The machine’s only objective is to maximize the realization of human preferences.
The machine is initially uncertain about what those preferences are.
The ultimate source of information about human preferences is human behavior.

I’ll leave the deeper analysis and reflection that Russell’s work deserves to you and the internet, because I now to turn to a related alignment problem that we, humans, face in education and in our own lives: How do we live in accordance with our values, and how do we trust that the values we hold are in fact good or complete?

I expect that a great deal of internalized personal anguish — and unrealized external positive impact — comes from living without a deliberate orientation towards strengthening (even if never fully achieving) this alignment. Think: driving cars, eating meat, vapid social media, fraught relationships, a neutral or net-negative job, frivolous spending… We do things (and choose to participate in systems that force us to do things) that are clearly not in alignment with our values upon even a trivial amount of reflection. We pay a steep psychosocial price when we keep playing games we’d easily denounce in moments of true clarity, but we’re up against massive incentive structures that can only profit by shrouding these insights. The world suffers, and we do too. We’re not Asimov’s or Russell’s robots, but perhaps we too need better “rules” to continually assess our lives against to orient our alignment.

(Religions are historically quite good at establishing these frameworks, and I’d love to think more about this with readers in time.)

This is especially true when we address this human alignment problem with children and youth, who do in many ways need to engage with clearly articulated meta-values to guide their moral landscape exploration. What, then, might we ask of our students and in turn of ourselves to help us all feel alignment between the values we hold (or ought to hold) and the ways in which we act and exist in the world?

While there are certainly many possible and overlapping goals, I propose the following minimally viable trio:

Be morally curious.

We must first support young people in the development of both moral imagination and moral humility. A child does not have much control over the normative cultural environment into which they are socialized, but as soon as young people are capable of metacognition we should encourage reflection on their moral beliefs and not merely indoctrinate them to accept our own.

An ancestor who was morally curious might have been willing to consider the possibilities, for example, that atrocities in our eyes today from burning cats to holding slaves were morally wrong even if the rest of their society held some worldview in which these actions had no moral component whatsoever. History should tell us that certainly we’re getting some things wrong today, and we should orient students towards being quite curious about what those things might be.

We should tell our kids that they in fact may be the ones to make this apparent to us, and play our part in this process by being open when they try to do so. We should all presume that our brains are investing quite a bit of energy into convincing us something we believe is wrong is nonetheless permissible or inconsequential because the psychological or physical cost of that acknowledgement would upset some comforting equilibrium.

This is a liberal principle, especially when working with youth who are impressionable to dangerous ideas in disguise, and will need to be balanced by a more conservative principle of caution in #3 below.

2. Be a dissonance minimizer.

Simply snooping around for aspects of one’s life or culture that are possibly unethical is a great way to get buried in dissonance-driven depression. The implications this curiosity and imagination unearth are exactly why moral curiosity is potentially costly. We therefore also need to instill in students a strong instinct to minimize dissonance between beliefs and actions/inactions, and help them take great pride and pleasure in doing so.

Once a moral belief crosses some threshold (of credence, of importance, or of both), students must have ways to act on it, and indeed our educational system should be designed to equip them with skills and pathways for reducing value dissonance that they feel. We don’t just want students who feel like guilty moral monsters; we need them to actually be empowered to make change too. When dissonance reduction by living in accordance with your values is given enough weight in your own calculus of fulfillment, then you are more comfortable being morally imaginative because any challenge to your lifestyle that it poses in fact provides an opportunity to minimize dissonance.

These first two principles incorporate the sensitivity/inclination/action framework that Harvard’s Project Zero uses to describe the ingredients of dispositional behavior, but raise some concerns about unilateral moral actors and individualism while exploring a complicated space. A third principle is therefore warranted to provide students with support and guidance in this process:

3. Handrails and guardrails.

Rules 1 and 2 in isolation, combined with power or influence, could quickly incorporate as a paving company straight to Hell, so we need this third more conservative consideration. I imagine two forms of counterbalances: Handrails to provide support and community for youth along the journey of moral investigation and guardrails to provide insurance against dangerous deviation too far outside the societal moral Overton Window. Providing handrails could mean helping students take into consideration the ideas of trusted others, “averaging” expected-value style over multiple worldviews, or promoting a general orientation against dramatic, irreversible actions. Guardrails are institutional rather than interpersonal, supporting reasonable time delays on dramatic personal or systemic fluxes. Intergenerational communities, for example, are a combination of both support systems, ensuring that youth voices are considered in the context of existing wisdom and practice. I’ll be thinking more about this more concrete component of the framework next week!

Giving students space to be morally curious, skills and pathways to act in accordance with their values, and an opportunity to do this work in supportive communities can provide the initial structure in which young people can drive moral progress.

Thanks for reading! My ideas in these reflections are half-baked, so I’d love help finishing the job by communicating with you here or elsewhere : )

A Human Alignment Problem

Written by Jonah Boucher