Reward Functions & Self-Reference

This article’s meaning is hidden. Find it.

The world as I see it (at 1AM after a long, stressful 2 weeks wrangling final projects for my two AI grad classes at MIT) mainly consists of gradients. Lots of gradients. Gradients everywhere you see, everywhere you think, everywhere you feel. We are trapped in a world of gradients. Or put another way, we are trapped inside our own internal gradient, which is itself trapped inside of a fantastically larger gradient.

To stop the rambling and get to the point, we can ask: what is a gradient? The word gradient comes from calculus — the gradient of a function is the direction and magnitude along which it varies the fastest. In the simple 1-dimensional case, a gradient is a derivative, or slope. In that case the direction it contains is its sign, whether it points toward larger and larger numbers, or points to smaller and smaller negative numbers. In larger dimensions, such as the 3-dimensional space we live in, the gradient is an arrow that points in a direction with some amount of weight, as if to say. It points up toward the mountaintop.

2-Dimensional gradients and maxima
3-Dimensional gradients

In the world of machine learning, we can use gradients to solve optimization problems. We travel along a gradient until we reach a maximum or minimum point. The point could be best set of values that enable a machine to learn things about the world, like whether an image contains a human face in it, or how best to translate a sentence of French into a sentence of English. The goal of all these algorithms is to find the best possible thing, to maximize some internal score. Everything in their eyes has a value, and their raison d’etre is to get the most value. Sound familiar?

We do the same thing as humans every day, every minute, every second. That’s why we designed these algorithms the way we did. We’re just maximizing. All day every day. That’s why we give meaning to things. It’s why we care about living, about becoming the best person we can be, whether that means being the best father, or soccer player, or AI researcher. What do I mean by best person we can be? Well that’s really such an abstract concept that if I give a one sentence answer I’ll you with a very wrong impression of what I’m getting at here.

Take a closer look at the phrase itself.

“Best person you can be”

In particular look at the first word. Best. Best by what standard? Who decides what’s best? Well ultimately you do, because it’s what you’re maximizing day in and day out. You make decisions every day because ultimately they contribute toward the best possible outcome for the world that you live in. What makes one outcome better than any other outcome? That’s something we have an extremely hard time answering with reason. There’s an internal cost function weighing every decision, every possible outcome to consider. And some come out on top. Typically for American humans living in the 21st century this function goes something like this:

  1. Take care of necessities like air, water, food, shelter, sleep
  2. Make sure I’m producing enough income to support myself and those who rely on me, such as family
  3. Take steps toward advancing and securing my long term career and family goals, and save up for retirement
  4. Get better at various hobbies
  5. Allow small pleasures like eating good food, drinking, relaxing, going on vacations

Thinking more abstractly, all these steps roughly translate to

  1. Get enough nourishment in the short term so that I can keep living and consequently have a longer opportunity in the future to:
  2. Maximize my reward/objective function and
  3. Refine my estimates of what that reward function is
  4. Take partial advantage of local maxima to my reward function both because
  5. it’s hard not to give in to these local maxima
  6. they may contribute to my overall wellbeing which will in turn help me maximize my reward function down the road

And taking this one step further:


That’s it. That’s all we do, that’s why we do it. Call it happiness, call it reward, call it world peace. Everyone’s got something in their minds they’re trying to optimize. Think hard about why you do anything you do. If you can’t trace it to something ephemeral that you can only describe in terms of good, bad and their equivalents, you’re not thinking hard enough.

Why do I want to get a good job? So I can support my family. Why do I want to support my family? Because they rely on me. Why do I care that they rely on me? Because we’re both happier and the world is better off if we do things for each other, let’s call that love.

See what I did there at the end? The last sentence becomes all about happiness, betterness and love. What are these concepts? Why do some things make you happy and other don’t? Why do you do anything at all?

This all comes down to the central duality that’s been hotly debated, deified, and defined over the course of human conscious thinking.


There are some things that are good. There are others that are bad. Let’s try make the world have more of the good things and less of the bad things.

These concepts are in every religion and every human school of thought. Sometimes a school will attempt to go deeper, explain why things are Good and others are Bad. Most include the presence of a God, and many at a cursory level attribute good to God, Bad to something that’s the opposite of God. We haven’t really done anything there. That’s still self-referential. God is Good, not God is Bad.

The more thought-out religions and schools of thought go a step further. The only possible meaning we can pull out of the existence of Good and Bad is the fact that we are basically stuck with them. Except that very act of acceptance is itself an act of defiance. Before this act we are stuck trying to maximize Good at the expense of Bad. Acknowledging that that process is always ongoing and undefined is to acknowledge that we can’t win, and that we will never be better off by trying to maximize Good.

Every step we take towards maximizing Good just allows for a new definition of what is Good and what is Bad.

If there is always Good and there is always Bad, then there is always more work to be done towards getting rid of the Bad. And all we really care about at the end of the day are the classes, Good and Bad. All sublevels of Good and Bad still ultimately fall into the two camps. So we can’t actually reach the ultimate Good, and we can’t destroy the ultimate Bad. But we’re forced to continually try. We have to. It’s the only thing we know how to do.

But there’s a way out. Let’s take one giant step back to the title. Self-referentiality. Why am I even writing this article? Is it to maximize my own reward function? On the surface I’m just advancing slightly toward my own personal Good for some mostly hidden reason mired in the intricacies of my current state of being. However, if we venture just a bit deeper, we also see a truer self-referentiality. That awareness of Good and Bad, and promoting that awareness, and still striving towards Good even with an understanding that these concepts are fundamentally unattainable and incomprehensible, is in itself a step towards freedom from the never-ending cycle. I’ll end with a traditional Zen koän.

Coming empty-handed, going empty-handed — that is human. 
When you are born, where do you come from? 
When you die, where do you go? 
Life is like a floating cloud which appears. 
Death is like floating cloud which disappears. 
The floating cloud itself originally does not exist. 
Life and death, coming and going, are also like that. 
But there is one thing which always remains clear. 
It is pure and clear, not dependent on life and death. 
What, then, is the one pure and clear thing?