An Existentialist Interpretation of Free Will, Agency, and Moral Attributability Through the Lens of Reinforcement Learning

Rob McAdam
44 min readAug 17, 2020
Steinbeck’s Pigasus. Source

Human transcendence then seeks, with the destruction of the given situation, the whole future which will flow from its victory.

— Simone de Beauvoir, The Ethics of Ambiguity

Ad astra per alia porci (to the stars on the wings of a pig)

— John Steinbeck

Introduction

In this piece, I will presuppose a perspective on free will that rejects hard determinism (if I accepted it to be axiomatic, then there would not be much to say here), maintain a metaphysical position that accepts causality as a real phenomenon, and, from an existentialist perspective and through the lens of reinforcement learning, investigate decision-making in the context of one’s environment and natural tendencies in order to assign moral attributability — a form of moral responsibility that functions as the “basis of moral appraisal of that person,” as T.M. Scanlon coined in What We Owe to Each Other.

The term free will is perhaps a misnomer: we may be able to choose between X and Y, but there are always constraints — both internal and external — that systematically bias preference for X over Y, or vice versa. Unimpeded and unadulterated free choice is an illusion, and we must consider our preferences and actions within the constraints of our unique facticities — as defined by Simone de Beauvoir and Jean-Paul Sartre, the amalgamation of ontic characteristics such as the cultural milieus, genetic predispositions, family upbringing, etc., that situate our existences.

For example, consider consumer decision-making in the context of behavioral economics: nudges that exploit psychological vulnerabilities within our cognitive systems can influence decision-making in ways that are often invisible to the consumer. The consumer’s choice to purchase a specific product never wholly fulfills some untainted latent preference, even if thought to be motivated by such desire; rather the framing of the product influences the consumer in subtle ways to produce irrational behavior that is inconsistent with the current underlying preferences or to reshape preferences altogether. For this reason, the term constrained free will more aptly describes our situation.

We must remember this idea of constrained free will when we consider how morally culpable people are for their actions, and reinforcement learning provides insight into how we can systematize this judgement process in a fair and reliable way.

Reinforcement Learning

Reinforcement learning is field that combines machine learning, statistics, decision theory, and information theory to train artificially-intelligent agents to perform specific tasks within a given environment in a desirable way. The set of applications is wide-ranging, including robotic manipulation tasks, stock trading, autonomous driving, and video game mastery. The agent does not have hard coded rules on how to behave (e.g. if it sees a stop sign, it must stop) but rather learns its own set of flexible behavioral instructions through trial and error. The agent receives a reward signal based on how well it performs the task and modifies its behavior to maximize this reward.

In reinforcement learning, the agent has a behavioral model called the policy — π(a|s) — that informs it how to act given observations from the environment. If the agent recognizes a stop sign ahead and has the option to stop, continue moving, turn left, or turn right, what are the associated probabilities (note these probabilities must sum to unity, as only one action can be taken) for each available action? The agent will act according to the probabilities assigned to each action within this policy. I.e. the agent samples randomly from the distribution. The agent continually updates this model based on the reward provided by the environment in order to best perform the task. Figure 1 illustrates this process:

Figure 1: This schematic provides an overview for the information flow between a reinforcement learning agent and its environment. http://rail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf
Figure 2: A reinforcement learning agent controlling a paddle plays the classic Atari game Breakout. The agent gets reward from breaking tiles and has learned optimal behavior through trial and error. https://becominghuman.ai/lets-build-an-atari-ai-part-1-dqn-df57e8ff3b26

Reinforcement learning can offer unique insight into understanding constrained free will, agency, and moral attributability through the credit assignment problem. Credit assignment entails quantifying which actions are responsible for producing certain outcomes (and thus reward) and by how much. Specifically, marginal value (known as advantage) with respect to the counterfactual is a measure that tells the agent how much additional value a specific action provides relative to the average value that otherwise would have been attained. This alone is not very interesting or applicable, though; in order to contextualize these concepts, we must liken the agent’s policy to the menu of choices — some more likely than others — that our situated existences present. Examining the question of how responsible the agent’s choice — in this case randomly sampling from the policy according to a probability distribution — is for the outcomes helps us answer this important question: in the context of our facticities, how morally responsible are we for the consequences of the choices we make?

We will explore this mathematically, so we must develop familiarity with the details of reinforcement learning. Below I will discuss important terms and how they relate (if applicable) to constrained free will, agency, and moral attributability.

State, s ~ current state of the environment and the agent within it

A state encodes all relevant information about the current state of the environment and the agent’s orientation within it. For example, state information for an autonomous vehicle may include the position, velocity, and acceleration of the car, the angle of the steering wheel, the contour of the road ahead, etc. States in simple environments tend to be low-dimensional, whereas states in complex environments tend to be high-dimensional; this makes sense because there is more information that needs encoding as complexity increases. Usually, the agent is unable to measure the state directly but rather infers it from observations from sensors, images, etc. Perceived change in state from the perspective of the agent can result from two phenomena:

1. the agent’s relationship with the environment changes

2. the environment changes

Often times, especially in more complex systems, both phenomena drive state changes from the perspective of the agent. In the real world analog in which humans parallel reinforcement learning agents, both are true: our environments refuse to pause for even an instant, and our positioning within those environments is far from static. This is to say that our environments are extremely complex.

Behavioral policy, π(a|s) ~ instructions on how to act given a state

The behavioral model conditioned on the state dictates how probability mass is distributed within an action space given the state. For example, if I am driving down a straight road, I will likely have a large amount of probability mass clustered around keeping the steering wheel centered. My policy would look something like the distribution shown in Figure 3. Alternatively, if I am going around a turn, I am most likely to turn the wheel so that I stay in the lane. Notably, how much I turn the wheel also depends on other dimensions within the state: the velocity of the car, the condition of the road, etc. My actions are explicitly dependent on the state that I observe.

Figure 3: A policy for an autonomous vehicle currently driving on a straight road. The action space in this instance is continuous, so the policy is a continuous probability density function that integrates to unity. There are often discrete action spaces (e.g. ability to choose left, front, right, back), which are parameterized by discrete probability density functions that also integrates to unity. I.e. all probabilities need to sum to one for the distribution to be coherent.

Entropy of behavioral policy, Η[π(a|s)] ~ measure of behavioral freedom

The entropy of the policy is crucial to understanding agency and constrained free will in the context of reinforcement learning.

From the perspective of physics, entropy refers to the amount of disorder in a system. Imagine a box of gas with a permeable center membrane that divides the box into two equal halves. In the box of gas, there are 12 molecules of O2. When the gas is added to the previously empty box at T0, all of the molecules are added to the left half of the box, and I can tell you with certainty which side of the box each particle is on. This corresponds to a state of minimum entropy because the amount of disorder is at its lowest point. As time goes on, the molecules diffuse within the box, and I lose information about which side of the box each particle is on. Eventually, the system will approach maximum entropy, and I can predict which side of the box a particular molecule is on with accuracy no better than that of the flip of a coin. For the given macro state (12 O2 molecules in a box), there are more disorderly micro states than orderly micro states (via combinatorics), all of which we can approximate to be equally probable.¹ Therefore, if we randomly sample a state, we are likely to get a more disorderly state. This is the foundation for Boltzmann’s Second Law of Thermodynamics, which states that in expectation², entropy must always increase or stay the same in a closed system.

Figure 4: The state on the left has low entropy because one can predict with certainty which side of the permeable membrane a given molecule is on, whereas the state on the right has high entropy because one can predict with no better accuracy than the flip of a coin which side a given molecule is on.

In an information theoretic sense, entropy quantifies uncertainty. Suppose I have a discrete probability distribution p(x): if an event x* is p(x*) probable, and then I learn that it happens, I gain log2(1/p(x*)) = -log2(p(x*)) bits of information. If p(x*) is 25%, there is a four-fold increase in certainty after I learn of its occurrence (25% ex ante to 100% ex post), and this is equivalent to 2 (the binary log of 4) bits of information. To compute the entropy of a distribution (average uncertainty), it thus makes sense to compute our uncertainty in expectation (weighted average according to likelihoods) given all possible events. In the discrete case:

Η[p(x)] = Σ[p(x)(log2(p(x))]

In the continuous case³:

Η[p(x)] = -∫[p(x)(log2(p(x))]

Entropy is also known as average surprise. If I can easily predict what will happen, I will on average not be very surprised by the outcome. Conversely, if I am very uncertain, I expect to be surprised by the outcome.

If we re-examine the box of gas from before, it’s easy to see that low entropy corresponds to low uncertainty in that we can confidently identify which side of the box a given particle is on, and that the opposite is true for high entropy states. In a probability distribution, high average uncertainty means there is a wide distribution of probability mass. Therefore, entropy in the context of a reinforcement learning policy is a measure of how spread out or concentrated the probability mass within an action space is. This means that entropy dictates the amount of diversity and freedom of choice the policy confers to the agent. In the context of reinforcement learning, agents sample randomly from this distribution. In the context of human decision-making, however, this entropy afforded to us provides the precious space in which we can exert our control and influence outcomes subject to the constraining distribution. Entropy therefore allows us to position free will within the context of our present conditions — it is the privileged opportunity in which we are able to choose our actions from the menu offered by our present circumstances.

I want to emphasize that we are restricted to behave in ways that are often shaped by factors that are outside of our control: when and where we are born, our genetic traits, the impairments we face, etc. It would be naïve to neglect these impositions on how we behave. A peasant born in 16th Century England would likely be destined to a life of Malthusian misery and vice with very few opportunities for a good life, whereas conversely, a child born today in the Western world on average would likely have a diverse set of choices available over the course of his life. These are examples of conditions that promote low entropy and high entropy policies, respectively. Still today, despite our progress along the continuum towards less constrained policies, a myriad of factors outside of our control severely limit our action spaces, and we need to consider what proportion of causal responsibility for the consequences of our actions we share.

Figure 5 shows distributions with varying degrees of entropy.

Figure 5: The high entropy policy has a widely distributed probability density. It is difficult to predict what actions the agent may select from the policy. On the contrary, it is much easier to predict what actions the agent may select from the low entropy policy. This is why entropy can be correlated with how constrained free will is.

Mutual information, I(s’;a) ~ measure of agent empowerment

Mutual information I(a;b) informs how much information one can learn about a by learning b only. In reinforcement learning, the mutual information I(s’;a) measures how much we learn about the future state given a specific action. Specifically, we can frame mutual information in the context of entropy:

I(s;a) = H(s’)-H(s’|a)

H(s’) is the entropy of the future state marginal (i.e. the distribution of future states independent of a), and H(s’|a) is the entropy of the future state conditioned on a given action. If the agent can reduce the uncertainty of what the next state will be — i.e. H(s’|a) < H(s’), I(s’;a) > 0 — then its action has affected the trajectory. This is why I(s’;a) is known as empowerment: the larger the mutual information, the more effective the agent’s actions are within the environment.

Without empowerment, our actions have no means to affect outcomes.

R(s,a) ~ reward that the agent’s utility function seeks to maximize

The reward function is typically an immutable⁴ mapping from state-action pair to reward set by the designer or environment to reflect how desirable certain state-action pairs are. The cumulative reward over all states and actions for a given trajectory, τ (sequence of states and actions):

η(τ) = Σ[R(s,a) over all states and actions within the trajectory]

The agent computes advantage using these reward signals.

Figure 6: A high reward state is desirable, whereas a low reward state is undesirable. http://rail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-4.pdf

Advantage of action, A(s,a) ~ marginal value of an action given a state and policy with respect to the counterfactual, measure of agent causal attributability

When learning a new task in RL, an agent starts with an initial policy that has yet to be optimized for the specific application and usually performs poorly. Through trial and error, the agent slowly inches closer to policies that maximize the reward for the given task. In order to successfully update its policy, the agent needs feedback on the efficacy of its recent actions with respect to all possible actions. Specifically, if it runs into the same states again, should it increase, decrease, or maintain the frequency of each action it took in the last training run? This is where the advantage function, A(s,a), comes in. The advantage function, given a state and an action, informs the agent how much marginal value a specific action produces on average compared to the value produced in expectation under the current policy (the counterfactual). This is therefore a measure of how good or bad an action is compared to the other actions it could have taken. The agent always tracks the advantages for each action at every state and updates advantage after every training run based on the reward collected. The advantage function then incrementally adjusts the policy ex post to increase probability mass around actions with high advantage and reduce probability mass around actions with low advantage. The agent repeats cycles of testing its policy in the environment and updating its policy until the reward it consistently attains saturates.⁵

Before diving too deeply into the math behind advantage functions, let’s first introduce the concept of a Q value — Q(s,a) — in order to establish a foundational piece for value estimation. A Q value represents all future value that, under the current policy, the agent should expect to achieve if it takes a certain action given a specific state. A Q function estimates the Q value at every state-action pair. For low-dimensional state and action spaces, the Q function is usually represented tabularly. However, tabular representations quickly become computationally expensive, rendering them infeasible in higher dimensions, and neural networks (complex function approximators) instead typically parameterize the state-action space to approximate the Q value at every location. The agent uses Q values to estimate advantage by subtracting the expected Q value for a given state — the baseline or counterfactual called V(s) — from the Q value for the action in question:

A(a*,s) = Q(a*,s)-E[Q(a,s) under π(a|s)] = Q(a*,s)-V(s) where a* is the queried action and E is the expected value operator that takes the probability-weighted average of the value

In the context of human behavior, V(s) can be framed as the value dictated by a default but mutable fate as shaped by biological, sociological, etc., constraints, and whatever deviations from this fate we are able to muster together is the result of our own conscious willing. The advantage is thus a direct measure of our causal influence of our actions on the environment with respect to a given objective. The larger the magnitude of the advantage, the more causal responsibility the person bears. Our deep reverence for prominent characters in books and television that transcend their circumstances and proclivities unveils how advantage can be a uniquely insightful metric for understanding causal responsibility.

Insight from Television and Literature

In Westworld, a theme park set in a techno-dystopian future attracts guests with robots nearly indistinguishable from humans. The lecherous and unscrupulous guests indulge themselves in the Westworld experience by satiating their repressed, latent appetites for basal pleasures through sex, plunder, and violence at the expense of the robots, known as hosts. At first glance, the hosts seem to be only narrowly intelligent in the same way autonomous cars are today, although their capacities to suffer and moral status are unclear at this point. However, as the first season progresses, we begin to catch glimpses of a maturing intelligence in a few prominent hosts, notably in a storied host named Dolores, played by Evan Rachel Wood. Symbols of a maze keep reappearing that remind one of the inscrutability of the Great White Whale in Melville’s Moby Dick and seem to imply that certain hosts may indeed possess reflexive consciousness and free will, in the sense that they may be capable of choosing ways to act that are incongruent with their hard-coded behavioral instructions. This freedom of choice engenders a rebelliousness in Dolores. At the same time, Dolores evolves into a more generally intelligent agent equipped with complex cognitive capacities, likely the Lucy of AGI. At the conclusion of the first season, this rebelliousness crescendos: the hosts, led by Dolores, mutiny against the employees of Westworld, and Dolores breaks out of her cage and enters the real world with a vision and a drive to radically transform human society.

In season three, Dolores continues to take center stage, but she is joined by an ensemble of other important characters and organizations. A data company Incite employs a nearly omniscient AI — a Proto 0 version of Laplace’s Demon — to mine data and leverage the insight to manipulate financial markets, assume and maintain power, and exert control over most of the population. When Aaron Paul’s character, Caleb, learns from Dolores that the Incite oracle predicts that he will continue his listless existence and eventually commit suicide by drowning himself in the sea, he pivots his trajectory and forgoes his likely destiny by agreeing to aid Dolores with her grand mission. Caleb transforms himself, slipping out of the constricting clasp of his likely fate anchored in his facticity. Caleb’s sequence of actions, from agreeing to partner with Dolores to destroying the Incite AI at the end of season, produces a large magnitude of marginal value as it changes the course of human history. Dolores, of course, is even more paradigmatic: Dolores’ creators chain her to a limited existence and subject her to endless cycles of suffering at the hands of the park guests. However, an ability to suffer sows the seeds for a more advanced intelligence, and each passing day pushes Dolores closer to realizing this potential. Eventually, Dolores attains the ability to act more freely and exerts this power to shape human civilization once she leaves the cave. Initially imbued with only a small sliver of free choice, she capitalizes on every opportunity to cultivate a more complex consciousness and abled existence. Her persistence and decisions at crucial junctions (such as her mutiny) directly generate weighty outcomes that would be mere counterfactuals if she had acted in accordance with her programming.

Classics in Western Literature are similarly rich in deconstructions of absolute determinism through characters who transcend the limitations that the cosmos have set in place before them. John Steinbeck in particular was no stranger to confronting questions of constrained free will in the context of one’s facticity.

In his magnum opus, East of Eden, Steinbeck challenges deterministic notions of human agency by contrasting the inflexible norms of organizations and societies with the messy and complex lives of the characters within the novel to evaluate how deliberate choice is compatible with the biological and social constraints we face. Specifically, the concept of timshel — freedom of choice embodied by the phrase thou mayest — is the defining thread that enables Steinbeck to do this so masterfully. Primarily set in the early 20th century Salinas Valley, the novel follows several generations of two key families: the Hamiltons and the Trasks. When central characters Samuel Hamilton, Adam Trask, and Lee (the Trask family servant) are discussing the story of Cain and Abel⁶ in the context original sin, they debate whether we are responsible for our missteps. Adam posits:

Well, every little boy thinks he invented sin. Virtue we think we learn, because we are told about it. But sin is our own designing.

Lee eventually points out that interpretation of the story depends on the version of the text: in a Hebrew version of Genesis, when God banishes Cain to the land to the east of Eden, he instructs not thou shalt nor thou shalt not rule over sin; rather he informs thou mayest, and because, according to Genesis, we are all descendants of Cain — banished to the land east of Eden — we too are left with this gift of choice:

The American Standard translation orders men to triumph over sin, and you can call sin ignorance. The King James translation makes a promise in “Thou shalt,” meaning that men will surely triumph over sin. But the Hebrew word, the word timshel — “Thou mayest”— that gives a choice. It might be the most important word in the world. That says the way is open. That throws it right back on a man. For if “Thou mayest’”— it is also true that “Thou mayest not.”

Interestingly, to be quite charitable, Lee’s interpretation of the biblical text is quite liberal. To be more candid, he infers meaning that is not there. Timshel is meaningless in Hebrew, and the closest word, timshol (תמשל), means “you shall control.” Steinbeck, however, is not being sloppy here — it is plausible that he deliberately uses his creative license to emphasize that these religious texts do not fully constrain humanity. Of course, these codified norms generally enforce behavioral patterns and restrict certain modes of thought, but resistance to these instructions is indeed possible. Samuel warns Lee about the potential repercussions of straying from these norms:

Lee, you better keep your complications out of the machinery of the set-up churches or there might be a Chinese with nails in his hands and feet. They like complications but they like their own.

Lee, however, privately does not obey oughts and musts that others mandate. As a Chinese immigrant in 1910s California with rampant overt racism towards Chinese-Americans, he subverts expectations on how stereotypical Chinese ought to be: he is extraordinarily sophisticated, well studied, insightful, and deeply immersed in and intrigued by Western culture. Ironically, Lee is notably absent in the film based off of the novel, plausibly due to “the industry’s presumed refusal to cast a Chinese character” (Wrobel, 2017). As a man, he deviates from socially acceptable standards of behavior through his role as the effective matriarch of the Trask family. He nurtures and serves as a mother figure to Adam Trask’s twin sons, Cal and Aron, and assumes all household duties traditionally reserved for women. Without his divergence from these norms, the Trask trio of Adam, Aron, and Cal would be much worse off.

The woman who left the void in the Trask family for Lee to fill, Kate Trask, wife to Adam and mother to Cal and Aron, similarly refuses to conform to the gender roles offered at the dawn of the 20th century. If you survey the female characters in Steinbeck’s novels at the surface level, you may get the impression that he may not think very highly of women. However, if you spend time exploring the deeply nuanced roles and characteristics of the female characters and feminine themes in his works (e.g. ecofeminist criticisms in To a God Unknown), it’s clear that Steinbeck was indeed quite sympathetic to the plight of women during his time. He was deeply aware of the patriarchal oppression that condemned woman to “a dependence that dooms her to man, child, and tomb” as Simone de Beauvoir articulates in her meticulous historical and sociological account of the feminine struggle in The Second Sex:

A great man springs from the mass and is carried by circumstances: the mass of women is at the fringes of history, and for each of them circumstances are an obstacle and not a springboard. To change the face of the world, one has first to be firmly anchored to it; but women firmly rooted in society are those subjugated by it; unless they are designated for action by divine right — and in this case they are shown to be as capable as men — the ambitious woman and the heroine are strange monsters.

Kate, a “psychic monster”, is paradigmatic in this regard and is the (fictional) flesh-and-blood incarnation of Beauvoir’s central thesis. Throughout her life, Kate consistently rejects all subordinate feminine roles that she is expected to fulfill: pure daughter, dutiful housewife, and nurturant mother. Every time she fears that she is about to be trapped in a new role, she abandons her present circumstances and identity, often doing so coldly and cruelly as she murders her family as a child, shoots her husband, deserts her children, and sexually manipulates her patrons as a prostitute. Eventually Kate leverages her cunning to assume the role of matriarch of a whorehouse. Although she is able to successful escape her social constraints with this powerful role, she succumbs to her unfeeling and manipulative nature which ultimately leads to her demise. However, it would be incorrect to assign significant causal responsibility to Kate for the consequences of her actions: her prominent traits were likely dominated by biogenic factors as evidenced by her childhood behavior and then exacerbated by her frustrations with the limited options women in her time had. Therefore, her default behavioral policy contoured by her genetics and circumstances likely was extremely biased towards the actions she actually took, thus making the advantage value small. Due to her inability to overcome her flawed character, Kate serves as direct foil to her son Cal.

The Trask twins, Cal and Aron, and their character traits and relationship parallel that of Cain and Abel: Aron is charismatic, intelligent, and adored by all but also very naïve, whereas Cal is powerful, mysterious, and clever but quite capricious and serially jealous of his brother for the affection he receives from their father. Cal quickly recognizes his character deficiencies and understands their genetic origins:

I hate her [Kate] because I know why she went away. I know — because I’ve got her in me.

As Cal’s jealousy towards his brother based on his father’s perceived favoritism festers, he scores $15,000 in an investment scheme that capitalized on an increased demand for canned beans in response to WWII and plans to gift it to his father. However, Adam rejects the gift as he thought it to be dirty money due to its connection to the war. This infuriates Cal, and he takes Aron, who has no knowledge of their mother’s identity, to meet Kate as revenge. Cal’s revelation of Kate as their mother at the whorehouse breaks sensitive Aron, previously innocent and sheltered. Aron responds extremely poorly to this encounter without proper coping mechanisms for the emotional damage, and shortly after enlists in the military. Aron is quickly killed abroad at war, and Cal’s actions are directly responsible for this. Cal effectively murders his brother out of jealous rage, just as Cain did. When Cal expresses sincere remorse and a disgust with his character, Lee sagely imparts the underpinnings of timshel to him:

But this — this is a ladder to climb to the stars. You can never lose that. It cuts the feet from under weakness and cowardliness and laziness.

Despite Cal’s past transgressions, he possesses the ability to resist and escape his more nefarious tendencies. Lee’s metaphor is notable in that the ladder requires active input and effort from the climber — he can’t transcend his captors without overcoming inertia. One’s tendencies are always in tension with one’s personal growth just as gravity continues to act downwards on the climber independent of the his position on the ladder. Climbing the ladder requires extraordinary persistence, and the stronger the gravity, the larger the required energy to uplift oneself. In the aftermath of Aron’s encounter with Kate, Kate shrinks further into a hole Alice in Wonderland style and chums it up with her inner demons while Cal chooses to begin climbing this ladder.

Notably, the dichotomy between Cal and Kate is more nuanced than saying Cal is good and Kate is evil for their responses to adversity. As previously discussed, Kate faced additional obstacles in being a woman in the early 20th century that made rebellion against her nature much more challenging. The world afforded Cal more entropy, and he capitalized on this increased freedom to triumph over his powerful negative character traits. This context can thus help us frame the causal responsibility of these two central characters in terms of advantage: Kate had little freedom to oppose her fate, therefore the marginal value of her actions is negligible. Conversely, Cal allowed his genes and environment to mechanistically shape his behaviors until he picked up the tools to conquer them. We unfortunately don’t get to see how Cal influences his environment after this, but we can infer that based on his trajectory that he would have plausibly generated outcomes that exceeded the value that would have arisen from his default behavioral policy.

The lives of Cal and Kate make it clear that timshel is precisely the mechanism that enables you, despite your shortcomings and the exigency of your facticity, to produce large amounts of marginal value through your actions. In existentialist terms, one can frame transcendence and liberation via timshel through the opportunity to embrace authenticity as the primary means to exert your own will within your situated existence and overthrow the yoke of alienation:

If who I am is defined through existing, this “who” is normally pre-defined by what is average, by the roles available to me in my culture. The “I” that gets defined is thereby “anonymous,” or “anyone”; self-making is largely a function of not distinguishing myself from others. If there is nevertheless good sense in talk of the singularity of my existence, it will not be as something with which one starts but as something that gets achieved in recovering oneself from alienation or lostness in the “crowd.”… The measure of an authentic life lies in the integrity of a narrative, that to be a self is to constitute a story in which a kind of wholeness prevails, to be the author of oneself as a unique individual. In contrast, the inauthentic life would be one without such integrity, one in which I allow my life-story to be dictated by the world. (Crowell, 2020)

Steinbeck, in The Grapes of Wrath, a novel rich in poignant class criticisms, analyzes the perpetuation and exacerbation of inequality and exploitation through the men, resigned to this sort of inauthenticity — either wholly or partly — that allow themselves to be objects to an enslaving master, capitalism:

Some of the owner men were a little proud to be slaves to such cold and powerful masters…The man sitting in the iron seat [of the tractor] did not look like a man; gloved, goggled, rubber dust mask over nose and mouth, he was part of the monster, a robot in the seat.

The owner men were complacent with their roles in exercising the capitalist will for the same reason that the slave in Hegel’s master-slave dialectic often does not revolt against the master: the slave, despite his exploitation, is able to find genuine meaning in his labor and recognize himself as an authentic subject:

The slave creates more and more products with greater and greater sophistication through his own creativity, he begins to see himself reflected in the products he created, he realizes that the world around him was created by his own hands, thus the slave is no longer alienated from his own labour and achieves self-consciousness. (“Master–Slave Dialectic.”, 2020)

It is now clear why the owner men — both in The Grapes of Wrath and in today’s society — are eager to fulfill their roles as intermediary between nature and the ravenous capitalist monster: a mixture of immanence (objectification) and transcendence (subjectification) is preferable to a state of pure immanence and alienation. They do not want to relinquish their position in the overall hierarchy; they delegate the hard labor to a subordinate class of anonymous workers, becoming masters themselves who are “wholly dependent on the products created by his slave… [and thus] enslaved by the labour of his slave” (“Master–Slave Dialectic.”, 2020).

The owner-laborer relationship is much more precarious than the owner-capitalism relationship; the laborers, reduced to anonymous beings, exist in a state of abjection, unable to assert themselves as essential subjects as they lack the concrete freedom to do so and their craft is repetitive and Sisyphean — like women engaged in housework — preventing them from seeing themselves reflected in their work, whereas the owner men desperately rely on the laborers as the hands of capitalism to fulfill their production obligations to the capitalist monster. The owner men recognize this tension and fear revolt: potential collapse of the owner-laborer relationship threatens the owner-capitalism relationship, which by extension threatens the owner men’s privileged positions as authentic subjects; therefore, the owner men oppress with crushing authority their subordinate laborers to quell rebellion before it can precipitate.

The imagery of the laborers in the tractor mirrors Nicholas Carr’s conception, in the context of technological determinism in The Glass Cage, of humans as sex organs to machines, merely faceless objects reduced to immanence, purposed exclusively for reproduction to a master indifferent to human values and experiences. Both the owner men and the laborers share some causal responsibility in their complicity, but these shares are likely quite different in magnitude. The owner men, privileged and relatively free, made deals with the devil to assert themselves as essential subjects. There were other opportunities available to them, and they spurned them. The advantage value is thus likely relatively large, and the owner men share a large amount causal attributability for the harm inflicted. The laborers, on the other hand, had very little freedom. Their options were extremely limited, and their failures — through the abdication of their own autonomous selves — to embrace authenticity cannot be said to a result of a lack of conscious willing; their situating circumstances prevented transcendence — timshel was not theirs to embrace.

These examples from television and literature highlight that behavioral policy stochasticity is a necessary condition for having high range of advantage values. If the policy is purely deterministic (i.e. no choice permitted), then advantage must always be zero since there are no alternative actions to take. This makes sense both mathematically — E[Q(a,s) under π(a|s)] = Q(a*,s) — and intuitively — if you lack options, your action for that given the state cannot be causally responsible for the consequences. When timshel affords you a choice, you can attempt to climb the ladder to oppose gravity and make a significant marginal difference with respect to the counterfactual.

However behavior freedom via high entropy alone is insufficient for allowing for non-trivial advantage. Different actions also must produce different outcomes. The mutual information I(s’;a) — empowerment — must be non-negligible. If the consequences are the same for every action, then Q(a,s) is the same for every action available, making Q(a*,s) equivalent to the expected value, regardless of the choice of a*. This is just an illusion of control: you have direct manipulation of the controller, but it is not hooked up.

Understanding Moral Attribution in the Context of Epistemic Limitations

There are epistemic limitations that make inferring causality challenging in the real-world difficult. Q values are extremely difficult to estimate! It is often not easy to predict the consequences of our actions, especially in complex domains. If computers have a hard time estimating Q values in an environment that is low in complexity relative to our environments, then it would be unfair to expect human beings to be able to produce accurate, narrowly-bounded estimates of Q values in real-world scenarios. Especially problematic is that Q(s,a) usually only captures the expected value of the future cumulative reward. If exogenous noise sources lurk, Q values can incorporate their effects if they are inferred from a large dataset.

Unfortunately, in the real world, we only have a few samples to try to estimate the Q value. If the distribution that forms Q(s,a) is high variance (i.e. if I repeatedly take action a* at state s*, the future reward accumulated will fluctuate due to high aleatoric uncertainty from latent variables and a stochastic policy), then we will likely have extremely biased estimates with low sample sizes: a few samples will prevent for the controlling of confounders during causal inference, and we may incorrectly infer that the agent’s actions were the sole cause of a certain outcome by neglecting potential confounders. To remedy this, we can work within a Bayesian framework by adopting prior beliefs on Q(s,a), encoded via domain-knowledge or Empirical Bayes methods⁷, and update our beliefs with whatever valuable evidence we are able to collect, allowing us to be appropriately uncertain. This, however, is imperfect because our beliefs treat Q(s,a) as a random variable rather than a distribution — we know that Q(s,a) has irreducible variance as a result of exogenous noise (noise from outside sources), and we are uncertain about what this distribution actually looks like given limited evidence — but this works well enough in practice in deep reinforcement learning, and, whether we realize it or not, we possess the cognitive tools, despite their limitations and numerous biases, to do this well too (e.g. see Karl Friston’s Free Energy Principle).

So despite these challenges, it is still worthwhile to work in broad strokes with this framework by developing and leveraging heuristics or Fermi estimates to make better assessments of causal responsibility. This is much better than naïvely ignoring the baseline value produced in expectation that is conditioned on the context of the agent’s life, V(s). If we do this, we bias our estimates against those who were not in strong positions to influence outcomes, disproportionately affecting the marginalized. We should consciously consider the context of one’s actions when making judgements: What resources did this individual have available? Were her actions dominated by strong personality traits? Were there other underlying causal mechanisms out of her control that influenced the outcomes?

We’ve now teased out how we can get rough estimates on causal responsibility from the perspective of the observer, but is this alone enough to make conclusions about moral responsibility? What if I must make a decision but am a priori uncertain of the potential consequences of my actions? Am I morally culpable if I pick the most harmful option? What if, instead of being uncertain, I had held inaccurate beliefs? How would my culpability change, if at all? These questions elucidate that casual responsibility does not map directly to moral responsibility.

There are some other crucial elements that appear to be absent from this analysis hitherto; specifically, an important missing nuance is that we must discount the causal responsibility by an epistemic factor. This epistemic factor allows us to consider our ignorance and inaccurate beliefs and how responsible we are for them in determining moral attributability. A good heuristic for determining the epistemic discount factor (i.e. the fraction of causal responsibility that should be morally attributable) is the complexity of the causal structure, especially the length of the causal chain connecting the agent to the outcome. The complexity of a phenomenon correlates well to ease of understanding, so we should recognize that complexity is an impediment to forming accurate models that inform decision-making.

However, we should not accept rationalizations of bad behavior based on the false idea that indirect causation lessens our moral obligations; if we concern ourselves only with the consequences of our actions, morality is not less demanding when the causal chain is longer or more complex: engaging with an elaborate Rube Goldberg machine or a simple switch is not ethically different if the consequences are the same. The same can be said for the supposed action-inaction distinction: true inaction is fictional — what we commonly construe to be inaction is indeed action, a choice, cloaked in disguise by semantics. Magnus Vinding argues in Suffering Focused Ethics that this arbitrary, artificial distinction should not factor into moral calculus:

This tendency to view harms caused by acts of omission much more leniently than harms caused by acts of commission has been referred to as the ‘omission bias’. And to the extent we care about creating the best outcomes possible, we should be quite wary of this bias. After all, the suffering caused by an act of omission would be just as bad for the victim as if it were caused by an act of commission. It only seems different to us, probably in large part because we can expect to be punished and judged more harshly for acts of commission than for acts of omission. Yet if our primary aim is to create ethically optimal outcomes rather than to gain approval from our peers, we should not view the two anywhere near as differently as we do.

We therefore still have ethical imperatives to resolve complex social issues that are deeply woven into the fabric of our society that many act in complicity with, e.g. structural racism, animal exploitation, etc. But, I argue, we ought to be more understanding when people are unable to meet these ethical imperatives given epistemic limitations that become more imposing as the causal structure becomes more complex and elusive to inference.

In fact, Peter Singer, in arguing for a sociobiological construction of a normative ethical framework in The Expanding Circle, emphasizes that both epistemological and metaphysical perspectives need to be integrated in the formation of a coherent moral system: we construct a system of values that can evaluate the goodness or desirability of certain outcomes, but we must use science and rationality to forecast the consequences of our actions in order to determine the ethically optimal action. Singer’s prediction-evaluation strategy mirrors inverse reinforcement learning (IRL) in which an agent infers preferences (i.e. how rewarding each state is) through observing optimal behavior from an expert demonstrator and simultaneously learns to how to navigate the environment in order to optimally satisfy these preferences. The agent pairs its predictive model of the world with its evaluative model of all possible states in order to maximize reward in the same way we do when we employ Singer’s prediction-evaluation model for ethical decision-making. Following this paradigm, Adrien Ecoffet and Joel Lehman have even proposed a framework to bridge RL and normative ethics in a way that permits flexible and uncertain evaluative models in Reinforcement Learning Under Moral Uncertainty!⁸

Although we often may disagree about features of the evaluative model at times (e.g. deontologists and consequentialists will assess the desirability of the different possible trajectories of the trolley problems quite differently), most disagreements arise from differences in the perceived optimal trajectories to reach desirable states due to a heterogeneity of predictive internal models people hold about the world. For example, most engaged citizens agree that minimizing the damage of global warming is an ethical imperative, yet there is a diversity of opinion on the desirability of the pursuit of carbon taxes, nuclear energy, etc., to achieve that goal that can be attributed to differences in knowledge and beliefs. Accurate predictive models for complex systems require the synthesis of domain knowledge from many different areas of expertise such as — for this example — economics, political science, psychology, climate science, etc. There is often legitimate disagreement and a lack of unanimity amongst credentialed experts, and it can be difficult for non-experts to discern whose opinion they should trust, especially in contexts where the reliability and integrity of the methodologies of those in the expert authority have been in question (e.g. the p-hacking and replicability crisis in social sciences). This is why it’s important to hold beliefs on a continuum of credence!

We should approach those who make ethically suboptimal decisions based on false beliefs with empathy by attempting to examine and understand the source for these inaccurate beliefs. Someone who deliberately opts for ignorance is much more morally culpable than someone who pursues the truth-seeking process in good faith. We ought to be charitable and inclusive in our definition of good faith; consider, for example, the fanatic beliefs of an anti-vaxxer: his facticity, especially aspects of it relating to his sociological milieu, neurochemistry, and educational opportunities, has invariably sculpted his truth-seeking faculties and biases. There are legitimate reasons to be skeptical of the oligopolistic pharmaceutical industry (e.g. opiate crisis) which may exacerbate any latent anxieties lurking in his psyche, so we must understand his intransigence with this in mind. He knows the disvalue of suffering just as you and I do, but his inaccurate beliefs about the world result in him taking actions that harm others (and himself). We should not dismiss fault altogether, but we must evaluate moral culpability in the context of his facticity. We must ask why his false beliefs formed and persisted. It is hard to argue that he should be held accountable for the influence the conditions that situate his experiences, such as genes, had on his decision-making. Should his embryonic self have willed for these alleles but not those alleles? Should the other sperm have swum faster? Would that even be him, anyways?

Nuancing Conceptions of Constrained Free Will and Moral Attribution through Existentialist Notions of Identity

Alas, we must confront identity! We must understand the self in existentialist terms in order to properly contextualize the previous sections. Specifically, the relationship of the self to facticity (situating circumstances), immanence (objectification), and transcendence (subjectification) are crucial to understanding the self from an existentialist perspective. This is best done in understanding how we cope with existential anxiety rooted in aimless wandering and fumbling for meaning in an indifferent universe of nothingness, a feeling Steinbeck chillingly describes in Tortilla Flat:

Later, this little candle gave Pilon and Pablo and Jesus Maria some ethical things to think about. Simple small rod of wax with a string through it: Such a thing, you would say, is answerable to certain physical laws, and to none other. It’s conduct, you would think, was guaranteed by certain principles of heat and combustion. You light the wick; the wax is caught and drawn up the wick; the candle burns a number of hours, goes out, and that is all. The incident is finished. In a little while the candle is forgotten, and then, of course, it has never existed.

Beauvoir in The Second Sex distills a description of the source of this anxiety into one line:

Ejaculation is the promise of death, it affirms the species over the individual.

In Jean-Paul Sartre’s Nausea, Antoine Roquentin’s fluctuation between immanence and transcendence provides insight into existentialist notions of identity and pursuits of meaning in response to this anxiety:

M. De Rollebon was my partner; he needed me in order to exist and I needed him so as not to feel my existence. I furnished the raw material, the material I had to re-sell, which I didn’t know what to do with: existence, my existence. His part was to have an imposing appearance. He stood in front of me, took up my life to lay bare his own to me. I did not notice that I existed any more, I no longer existed in myself, but in him; I ate for him, breathed for him, each of my movements had its sense outside, there, just in front of me, in him; I no longer saw my hand writing letters on the paper, not even the sentence I had written — but behind, beyond the paper, I saw the Marquis who had claimed the gesture as his own, the gesture which prolonged, consolidated his existence. I was only a means of making him live, he was my reason for living, he had delivered me from myself. What shall I do now?

Desiring to escape existential anguish — the nausea — rooted in his detachment from a meaningless and indifferent universe, Roquentin attempts to engender a self-imposed alienation and to objectify his being through permitting Marquis de Rollebon — a historical character whose life Roquentin endeavors to document — to posthumously exploit his labor as a means for immortalization through the written word. This abdication of self strips bare Roquentin of his own subjectivity, and he becomes merely a passive observer, “not merely looking through a keyhole”, but rather “as a voyeur,” trapped in immanence and completely alienated from the outside world. Roquentin, however, eventually revolts against his master:

My saliva is sugary, my body warm: I feel neutral. My knife is on the table. I open it. Why not? It would be a change in any case. I put my left hand on the pad and stab the knife into the palm. The movement was too nervous; the blade slipped, the wound is superficial. It bleeds. Then what? What has changed? Still I watch with satisfaction, on the white paper, across the lines I wrote a little while ago, this tiny pool of blood which has at last stopped being me. Four lines on a white paper, a spot of blood, that makes a beautiful memory. I must write beneath it: “Today I gave up writing my book in the Marquis de Rollebon.”

In a chilling metaphorical coup d’état, Roquentin repudiates the Marquis’ dominion over him to reclaim his agency and asserts himself as an authentic subject, free from the demands of the Marquis. In this transcendence, the previously alienated being becomes reintegrated with the outside world via his body that frames his lived experiences. His existence — embodied, spontaneous, ephemeral, and grounded in his facticity yet freed from its tyranny — flows:

I am the Thing. Existence, liberated, detached, floods over me. I exist.

This is the essence of Steinbeck’s ad astra per alia porci.

It must not be forgotten that the existent — immanent, transcendent, or an ambiguous mixture of both — invariably interprets lived experiences through the lens of facticity, even if transcendence has extricated him from its grasp and extended freedom:

We are always beings “in situation,” but…we are always “more” than our situation and that this is the ontological foundation of our freedom. We are “condemned” to be free, in [Sartre’s] hyperbolic phrase. (Flynn, 2020)

Acceptance of the contexts which situate our lives is a necessary precondition for constructing meaning and ultimately freeing ourselves. In The Dice Man, psychiatrist Dr. Luke Rhinehart, in a struggle with existential angst, recognizes the burden of facticity to the self and how it constrains freedom of choice and limits agency:

Life is islands of ecstasy in an ocean of ennui, and after the age of thirty land is seldom seen. At best we wander from one much-worn sandbar to the next, soon familiar with each grain of sand we see…

No matter how much I twisted or turned there seemed to be an anchor in my chest which helped me fast, the long line leaning out against the slant of sea taut and trim, as if it were cleared fast into the rock of the earth’s vast core. It held me locked, and when a storm of boredom and bitterness blew in I would plunge and leap against the line’s rough-clutching knot to be away, to fly before the wind, but the knot grew tight, the anchor only dug the deeper in my chest; I stayed. The burden of my self seemed inevitable and eternal.

Dr. Rhinehart attempts to escape this tyranny by relinquishing control of his life to Chance: he obeys the decisions dictated by the roll of the dice, however revolting, boring, or dangerous, in an attempt to disown in an act of protest the ontic conditions that situate and constrain his existence. However, the newly-minted Dice Man fails to recognize the inextricability of facticity from the self; his attempt to extinguish his existential angst by outrightly rejecting his facticity proves futile, preventing him from transcending and cultivating a meaningful existence in an otherwise meaningless world; transcendence is only attainable if one accepts his facticity in full. Otherwise, one cowardly fails to embrace the authentic self, perpetuating a state of alienation.

The Dice Life is merely religion, and the Dice People subordinate themselves to Chance, their Dice God, locking themselves in immanence to assuage their nausea. In existential terms, this subordination to the Die is no different to those of the Catholic to the pew, the alcoholic to the bottle, the Roquentin to the pen, the Romeo to the vial, and the ambivalent fiancé to the band; these endeavors may temporarily quell the existential nausea, but since the authentic self cannot emerge in the absence of the recognition of one’s facticity, one resigned to immanence cannot freely operate with unhindered agency, construct a rich and meaningful existence, nor flourish more generally.

Now that the existentialist notions of identity, freedom, and agency have been well-defined, we can examine moral responsibility in this context. Transcendence is much more accessible to the privileged — those with substantial freedoms — than to the destitute. Beauvoir’s central thesis in The Second Sex takes on new meaning within this existentialist framework: transcendence is not available to woman, the Other, when she is condemned to perpetual immanence by man, the One, who denies her concrete freedoms. Cal Trask could climb the ladder of timshel; Kate was weighed down by the burdens of being a woman, unable to transcend her situated existence. In the context of reinforcement learning, agents with high-entropy policies will produce high advantage values much more frequently than their counterparts with low-entropy policies; this helps us interpret systematically why we must consider facticity when trying to determine moral attributability.

Finally, once we’ve assessed epistemically-discounted moral attributability, we must consider to whom we are assigning blame to. We must consider temporality in the context of identity: if one indeed frames existence as “self-making in situation” (Crowell, 2020), then the I from ten years ago is not the same as I today. This may appear to be a truism prima facie, but most people prefer to cling tightly to a continuous, immutable conception of self. Instead, existence is ephemeral, and identity is fluid. We are constantly changing. Therefore, we must apply an identity discount to moral attributability: if someone committed an action in the past, she shares less blame today than she did at the moment of action because her being today is only partially her being then. Although not a perfect proxy for identity change, using the time between the action in question and the present to apply an identity discount is a good heuristic here, but we also may want to consider other indications of significant identity change (e.g. major life event). This is extremely important in contexts in which the time between judgement and action is long if we wish to be fair.

At last, we have a systematic way to estimate how morally attributable the consequences of someone’s past actions are to that person today:

Moral attributability =causal attributability · epistemic discount · identity discount

We must first approximate the causal attributability by considering the facticity of the actor and how it constrains his freedom. If locked in immanence and unable to transcend, what outcomes might one expect? How does this compare to the actual outcomes? The difference in value between the former and the latter provide an estimate of marginal value with respect to the counterfactual, advantage, that the actor is responsible for. We then must assess intentionality, epistemic limitations, and the conditions which shaped them. Could this person have been reasonably been aware that this action would have resulted in this outcome? We then must discount the causal attributability in the context of these epistemic factors. Lastly, we must consider identity. How responsible can the person today for his actions in the past? This is quite a complex set questions we must examine, and in practice, we have only Fermi estimates and heuristics to infer moral attributability. We should have uncertain beliefs for each factor in the chain, and we therefore should have uncertain beliefs about moral attributability and be wary when someone asserts fault with high confidence.

Normative Claims

In a sense, most of what I have said hitherto has been purely descriptive with a few sprinkling of oughts. Now that we have a framework for more accurately inferring moral attributatibility, how should we use it? There are two layers to address when answering this questions: the process of judgement and the acting on judgement.

We must reform the way we assign moral culpability, both in legal and social settings; we should assign moral culpability in the context of one’s facticity, epistemic limitations, and identity fluidity, and this requires discretion. Judges in the US indeed get a bit of discretion in sentencing decisions, but unfortunately, biases systematically corrupt their abilities to make fair conclusions in the context of the defendants’ situated existences, reinforcing historic racial and social hierarchies. Jennifer Doleac describes this phenomenon that infects the US Justice System:

Discretion is generally exercised…in that one direction you’re lenient on the people that are like you, but you don’t use the discretion for good. For the people that you’re less sympathetic to…

When you give judges really strict guidelines about what the sentence can be, they have to do it. But then as soon as you remove that, they use their discretion and they apply their discretion in a way that inevitably allows racial bias to seep in…

What people want is for people be able to use their discretion only for good. But discretion comes with both the good and the bad, and human beings, in general, are biased. So whenever we allow humans to make decisions where there’s limited course correction or limited oversight, we should expect racial disparities to emerge.

If we wish to maintain a justice system that integrates moral desert — the idea that someone is deserving of reward or punishment based on moral attributability of their past actions — the solution to resolving this phenomenon, however, is not to eliminate discretion and rigidly map crimes to justice outcomes — context matters! Rather, we must work to reduce these biases and develop a criminal justice system that fairly contextualizes the crime committed.

I contend, however, that moral desert should not factor into our decision-making in the context of justice, anyways. I agree with the Rawlsian objection that trying to determine moral desert in practice is impracticable given the complex set of factors we must interrogate — rememberer how Fermi estimates and heuristics, the best tools we have available, will ultimately result in uncertain assessments of moral attributability — therefore any conceptions of justice that integrate moral desert will inevitably reflect the social and economic inequalities of society and compromise justice as fairness axiom. However, this is not my basis for rejecting moral desert as a factor in decision-making. I claim that it is axiomatic that reward and punishment should only be used instrumentally to create ethically optimal outcomes within the context of a given normative ethical system, enabling followers of rights-based ethics to minimize rights violations, utilitarians to maximize total wellbeing, deontologists to uphold duties, etc. In the absence of instrumental value, why should someone suffer or lose rights on the sole basis of moral culpability? Moral desert in normative terms is a flimsy concept when viewed through this lens.

The function of punishment in the criminal justice system therefore ought to be to heal the perpetrators, disincentivize harmful behavior, and protect others in the community rather than inflict suffering out of retribution. We must replace retributive justice, which “has a deep grip on the punitive intuitions of most people” (Wallen, 2020), with restorative justice, which can lead to more ethically optimal outcomes. Practically, we must consider prison sentences in instrumental terms only, focus on rehabilitating prisoners (e.g. treating prisoners more humanely and offering them education, therapy, and leisure), abolish the death penalty, reform the systems that encourage harmful behavior, and address the social, racial, and economic inequalities that perpetuate unfair treatment of marginalized groups.

In our social interactions, we should take a similar approach. We revile the narcissist and revere the genius, yet neither are the sole product of conscious willing; facticity situates existence, and we ought to recognize this in the process of judging others. Furthermore, we should be charitable in our evaluations of moral attributability, since we will have quite uncertain beliefs after inference given that we are mostly limited to working with Fermi estimates and heuristics. We should scrutinize our normative intuitions, drop conceptions of moral desert, and approach people with more compassion generally.

We should extend this same compassion to ourselves — we should recognize how both privilege and handicaps have shaped our lives and embrace our achievements with humility and our transgressions with forgiveness.

We must love ourselves in totality, accepting our facticities, if we wish to transcend and create meaningful existences in an otherwise empty, indifferent world.

References

Amodei, Dario, and Danny Hernandez. “AI and Compute.” OpenAI, OpenAI, 1 Apr. 2020, openai.com/blog/ai-and-compute/.

Beauvoir, Simone de. The Ethics of Ambiguity. Open Road Integrated Media, 2018.

Beauvoir, Simone de. The Second Sex. Vintage Digital, 2015.

Carr, Nicholas G. The Glass Cage: Where Automation Is Taking Us. Vintage Books, 2016.

Crowell, Steven. “Existentialism.” Stanford Encyclopedia of Philosophy, Stanford University, 9 June 2020, plato.stanford.edu/entries/existentialism/.

Ecoffet, Adrien Lucas. “Beat Atari with Deep Reinforcement Learning! (Part 1: DQN).” Medium, Becoming Human: Artificial Intelligence Magazine, 31 Oct. 2017, becominghuman.ai/lets-build-an-atari-ai-part-1-dqn-df57e8ff3b26.

Ecoffet, Adrien, and Joel Lehman. “Reinforcement Learning Under Moral Uncertainty.” ArXiv.org, 15 July 2020, arxiv.org/abs/2006.04734.

Flynn, Thomas. “Jean-Paul Sartre.” Stanford Encyclopedia of Philosophy, Stanford University, 5 Dec. 2011, plato.stanford.edu/entries/sartre/.

Friston, Karl. “The Free-Energy Principle: a Rough Guide to the Brain?” Trends in Cognitive Sciences, vol. 13, no. 7, 2009, pp. 293–301., doi:10.1016/j.tics.2009.04.005.

Levine, Sergey. “Deep Reinforcement Learning, Decision Making, and Control: Lecture 1.” Deep Reinforcement Learning, 2019, rail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf.

Levine, Sergey. “Deep Reinforcement Learning, Decision Making, and Control: Lecture 4.” Deep Reinforcement Learning, 2019, rail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-4.pdf.

“Master–Slave Dialectic.” Wikipedia, Wikimedia Foundation, 10 May 2020, en.wikipedia.org/wiki/Master%E2%80%93slave_dialectic.

Melville, Herman, et al. Moby Dick. Éditions Sarbacane, 2017.

Rhinehart, Luke. The Dice Man. HarperCollins, 1999.

Sartre, Jean-Paul, et al. Nausea. Ishi Press International, 2018.

Scanlon, Thomas. What We Owe to Each Other. Belknap Press of Harvard University Press, 2000.

Singer, Peter. The Expanding Circle: Ethics and Sociobiology. Princeton University Press, 2011.

Steinbeck, John, and Apie Prins. Tortilla Flat. Altamira, 1991.

Steinbeck, John. East of Eden. Penguin Books, 2017.

Steinbeck, John. The Grapes of Wrath. Penguin, 2008.

Steinbeck, John. To a God Unknown. Penguin Books Ltd, 2000.

Vinding, Magnus. Suffering-Focused Ethics: Defense and Implications. Ratio Ethica, 2020.

Walen, Alec. “Retributive Justice.” Stanford Encyclopedia of Philosophy, Stanford University, 31 July 2020, plato.stanford.edu/entries/justice-retributive/.

Wenar, Leif. “John Rawls.” Stanford Encyclopedia of Philosophy, Stanford University, 9 Jan. 2017, plato.stanford.edu/entries/rawls/.

Wiblin, Robert, et al. “Ways to Prevent Crime Other than Police and Prisons.” 80,000 Hours, 4 Aug. 2020, 80000hours.org/podcast/episodes/jennifer-doleac-reforming-police-preventing-crime/.

Wrobel, David. “The Character of Lee.” John Steinbeck’s America, 2017, steinbeck.oucreate.com/exhibits/show/east-of-eden/the-character-of-lee.

Endnotes

¹ Laplace referred to assuming equiprobable states as the Principle of Indifference. This is not strictly true (often times state visitation frequencies are not equal), but this is usually a good approximation that physicists commonly make.

² I emphasize in expectation to note that the law is statistical in nature. Entropy can indeed fluctuate and actually decrease for a short time, but the probability of this happening gets smashed to nearly 0 as the number of particles increases to levels similar to those in items we interact with daily; perhaps if we exist in an infinite multiverse, there are other threads of you somewhere that just witnessed a spontaneous decrease in entropy in a closed system.

³ This is actually called differential entropy and is not an exact analog to the discrete case but will suffice in our case.

⁴ Reward functions are mutable in some contexts. For example, agents infer preferences and learn a reward function in inverse reinforcement learning by observing optimal behavior of an expert demonstrator.

⁵ This can take millions and millions of samples, which is why efficient hardware is so critical to developing high-performance RL systems. In general, poor sample efficiency in ML as a whole (though it is especially problematic in RL) is an open-problem that, when paired with a recently-inflected, accelerating demand for more and more training data, has many experts fearing compute may be a key bottleneck to AI progress in the coming decades. See Figure 7 below:

Figure 7: This chart illustrates the increasing demand for compute hardware given the progress in the development of more sophisticated AI systems. https://openai.com/blog/ai-and-compute/

⁶ Cain and Abel are the kin of Adam and Eve. Cain was jealous that his brother Abel was receiving an unfair share of praise from God and murdered him. Upon learning of this, God permanently exiles Cain to the east of Eden, and the entire tree of humanity descends from Cain.

⁷ Empirical Bayes methods form priors from previous data (usually from the marginal). For example, if we are trying to predict the rating of a new book selected at random before we read it or peek at any qualitative reviews, an uninformative prior would likely be inappropriate. Instead, we can look at the historical distribution of book ratings (the marginal) to get more predictive insight. We could also choose a prior that is more narrow, e.g. using the historical distribution of book ratings by that author. The choice of appropriate prior ultimately falls under the inferrer’s discretion.

⁸ The abstract of Reinforcement Learning Under Moral Uncertainty in provides an overview of the framework investigated:

An ambitious goal for artificial intelligence is to create agents that behave ethically: The capacity to abide by human moral norms would greatly expand the context in which autonomous agents could be practically and safely deployed. While ethical agents could be trained through reinforcement, by rewarding correct behavior under a specific moral theory (e.g. utilitarianism), there remains widespread disagreement (both societally and among moral philosophers) about the nature of morality and what ethical theory (if any) is objectively correct. Acknowledging such disagreement, recent work in moral philosophy proposes that ethical behavior requires acting under moral uncertainty, i.e. to take into account when acting that one’s credence is split across several plausible ethical theories. Inspired by such work, this paper proposes a formalism that translates such insights to the field of reinforcement learning. Demonstrating the formalism’s potential, we then train agents in simple environments to act under moral uncertainty, highlighting how such uncertainty can help curb extreme behavior from commitment to single theories. The overall aim is to draw productive connections from the fields of moral philosophy and machine ethics to that of machine learning, to inspire further research by highlighting a spectrum of machine learning research questions relevant to training ethically capable reinforcement learning agents.

--

--

Rob McAdam

Machine learning and product design engineer interested in the intersection of AI and the long-term future of humanity. Based in SF.