This is Your Brain on Brands

17 min readApr 14, 2023

Some thoughts on branding and machine learning that aren’t about an imminent apocalypse, creative or otherwise.

**“Brain emoji with meta, apple, nike, google brand logos exploding out the top.” 4/10**

TL;DR: While much of the AI conversation revolves on what it can make and how that might elevate/devastate the creative industries, there’s a lot we can learn from how it learns. Not only can we apply the principles of reinforcement learning to shape people’s expectations of how a brand works, we can also tap into the neurophysiological processes that underpin the act of learning — and the mechanisms behind happiness itself.

GARBAGE IN, GARBAGE OUT

“If your kid needs a role model and you ain’t it, you’re both fucked.”

― George Carlin

It has been quite the few months in AI. We’ve gone from “OMG, this dufus thinks a computer is alive” to “OMG, a computer is legit trying to destroy this guy’s marriage” quicker than you can look up a stock quote for the Tyrell Corporation.

Among sentient members of the marketing industry, most of the discussion/handwringing has revolved around AI’s potential impacts on how we ideate, create, and continue to pick up a paycheck. In other words, the consequences of what it makes. This is unsurprising — as a creative/knowledge industry, we’re defined by our output. While we put on a brave face and make nice with the robots in case they’re listening, the barely-sub-subtext is the existential threat to our livelihoods and sense of self-worth. And even though the perennially insecure ad industry has a track record of getting worked into a lather about whatever the technological fad du jour is, nobody ever worried that an NFT might make them destitute (although if it did, sympathy is in short supply).

What we tend to talk about less is the part that comes before AI makes anything: how it learns.

This is also not surprising. Machine learning paradigms are kind of technical and a bit abstract. Unless things go seriously off the rails, we tend to leave the question of what goes into the machines to the scientists and engineers. But when those decisions bubble up as a toxic runoff of sexist or racist bias, it becomes painfully clear that the question of how and what we should teach the machines is as much a welfare check on society as anything else.

Ensuring that AI acts in our best interests — the “alignment problem” that gives Brian Christian’s excellent book its title, and which happens to be the inspiration behind these musings — is central to the development of AI systems that are not just highly effective, but safe. “Safe”, as in “won’t cause the end of civilization as we know it”. As Mira Murati, Chief Technology Officer at OpenAI puts it, “AI systems are becoming a part of everyday life. The key is to ensure that these machines are aligned with human intentions and values.”

At a high level, marketing is also fundamentally an alignment problem: getting consumers to believe that their best interests are aligned with a business’s best interests. The nature of the beast means that that will never truly be the case, but as ethical capitalists, we do our best to close the gap as responsibly as possible by using people’s interests as the anchor and adjusting towards them, rather than the other way around. But this isn’t a critique of the ethics of post-industrial capitalism.

I want to explore how the same techniques we use to address the alignment problem in AI systems can also be used to address the alignment problem for brands. And the reasons why, as we’ll see, could not be more human.

2. THANK YOU SIR, MAY I HAVE ANOTHER

Before I dive in and inevitably annoy the AI/ML community with my tenuous grasp of machine learning principles, a disclaimer: I am not an expert, nor do I make any claim for the empirical veracity of these thoughts. I just think they’re an interesting analogy for the task of building more valuable and sustainable brands. “All models are wrong, etc.”, apologies in advance.

There are multiple paradigms for machine learning, each of which has their own advantages and disadvantages that make them more or less suitable for different AI systems, depending on what they’re designed to do. I’m going to focus on one of them: reinforcement learning.

For those not familiar with the concept, I’ll start with a simple definition. Reinforcement learning is a form of machine learning in which an agent (the thing you need to do a thing) takes actions in a specific environment in order to maximize its rewards over time.

There are four key components to reinforcement learning:

Agent: the AI that you’re training to achieve a specific goal.
Environment: the space in which the agent performs its action.
Action: the thing that the agent does that creates a change in its environment.
Reward: positive or negative feedback on how successful the action was.

The outcome of this is the value function — an expectation of what is most likely to happen in a given situation — and the policy — the set of rules that dictate what the agent does the next time it finds itself in a similar situation with a similar goal, based on the value function.

Here’s a simple example to get a sense of how it works in practice:

Agent: a robot learning how to kick a ball and score a goal.
Environment: the goal and the area in front of it.
Action: all the different ways of kicking the ball towards the goal.
Reward: positive when the robot scores. Negative when it misses.

As a result, the robot (or at least the program controlling it) assigns a value function to the most advantageous way to kick the ball, and a policy for how it kicks the ball.

This is similar to how humans also learn — not unlike when I take my six year old son to the park for a kickabout. He has learned from trial and error (action and reward) that punting the ball wildly results in a goal much less often than choosing a spot and sweeping it calmly into the back of the net. He has, however, also learned that smacking the ball into the bushes also results in dad having to run a lot, which is hilarious (high value function), and so his policy decision rather depends on whether his objective is to laugh at dad or get an ice cream.

So what does this all have to do with how we “learn” brands?

Think of it like this:

A person wants to do something: wash their clothes, book a vacation, look cool — typically some combination of rational and emotional goals. This is our agent. They are not entirely autonomous, because we want them to achieve that goal in a way that’s as beneficial to us, the brand, as it is to them (if not more so) — the “alignment problem” mentioned earlier. Therefore we create a brand environment that’s full of ways in which to interact with our brand — an ad, a website, social media, an ecomm flow, the helpdesk — and is designed to influence their behaviour. Each interaction — or action — is an opportunity for our agent to learn about our brand. Positive interactions teach that our brand is helpful, simple, playful, kind, daring — whatever value our brand idea is based on. This is our value function. Negative interactions teach the opposite. As a result, we form a policy about how we will interact with it. Believe what it says or not. Buy something from it or not. Use its social capital in order to burnish our own status. In other words, these actions within the brand environment cumulatively shape our expectations, and expectations are key. The value and meaning of a brand is the sum of expectations that people have of it. The artifacts that you put into your brand space (visual, linguistic, experiential) aren’t the brand. They are ways to anchor and influence expectations of the brand. As Marty Neumeier puts it, “A brand is not what you say it is. It’s what they say it is.”

When you create any kind of brand artifact, you are essentially creating a way to teach someone about your brand. You can do it well, or you can do it poorly, but in the end, the perception of your brand arises from the accumulation of these interactions. In designing these interactions, you are, in effect, creating a brand curriculum. While a systematic way of thinking about brands has existed since the middle of the last century, it has typically shown up in the form of brand identity systems. These are largely concerned with how the various visual (and increasingly audio) elements of the brand should be expressed in an array of relevant contexts. More recently, the idea of the ‘brand operating system (OS)’ has taken this idea and expanded it to include the underlying conceptual framework or values, beliefs, behaviours, etc. that give a brand its identity. Brand OS and brand curriculum are neatly (and intentionally) complementary ideas. If the operating system in the brand’s internal handbook for the what and how, the curriculum is where that rubber hits the road and begins to change the way people think and feel — “a shoe is just a shoe until someone steps into it.”

Steve Jobs got this instinctively. From the design of the products, to their packaging and unboxing, to the marketing, to the retail experience, to the customer support experience, touchpoint and interaction was considered, mutually reinforcing, and — most importantly — better than anyone had a right to expect. Few brands can — or probably ever will — match this kind of sustained and comprehensive leveling up of expectations. Peak Apple wrote the script, which means that even now, although every fibre in my body is straining against it, I can’t help but use it as a point of reference.

Pick a DTC brand from the canon and they’re also probably doing a lot of things right (while their stock prices are taking a beating, it’s probably not their brands that are to blame). You can bet their design system will be elegant and tasteful (although exceeding expectations in this department is tough, as what was once playfully contrarian swiftly becomes the playbook). Customer service will probably be chipper. Comms will have a certain cheeky charm. The products themselves will feel good enough and different enough to win you over — at least initially.

This is unsurprising for a couple of reasons. Many of these companies come from the same messianic culture of disruption that Apple did, and so share much the same philosophy of holistic, design-driven brand experience. But even more importantly, if your goal is to convince people to do something familiar in an unfamiliar way (get a subscription for razors, buy eyeglasses online, vacation in a random person’s house), you have a lot of teaching to do to get people over the hump. Any bumps on that hump, and it’s too easy to just say, “fuck it, I’m doing this the usual way.”

This is why consistency is so important. Without it, not only are you failing to create multiplying effect from one artifact (or lesson) to another, by neither reinforcing familiarity with a brand nor connecting the parts in service of a “big idea” that’s more culturally durable, you’re also projecting cognitive dissonance — the conflict that arises when your stated beliefs or values don’t line up with your actions that we humans find so distressing. For a disruptor brand whose very essence is predicated on the idea that this is better than what came before, any interactions that are even a bit worse can quickly become intolerable.

In many ways, this probably all sounds fairly intuitive. Create shitty comms, build shitty products, service them shittily, and people will have low expectations of your brand. Low expectations translate into low brand value. Create great experiences that surprise and delight, and they will view your brand as a source of rewards, and they will seek out more of it. So far, so duh.

But there’s a twist.

3. I WANT YOU TO WANT ME

We humans are continually making predictions about how things are going to work out before we know the actual outcome. As we move forward towards an uncertain end state, we continuously take in new information from our environment, and our predictions become increasingly accurate. For example, a prediction about what the weather is going to be like on Saturday is more likely to be accurate on Friday than it is on the preceding Monday. If we’re meeting friends at a bar, the closer our Uber gets, the more accurate our ETA.

With each new piece of information that we take in (evolving weather patterns, traffic conditions) we compare what we thought the outcome was going to be with what we think it’s going to be now. The gap between each successive guess is the temporal difference. Given that the latter of the two guesses is more likely to be correct, the “error” in our previous estimate is an opportunity to learn and get closer to the truth without having to wait until we get to our destination.

What’s fascinating about this is that temporal difference learning not only works as an approach to reinforcement learning in AI systems, it parallels the way in which humans learn through the process of reward prediction at a neurophysiological level. And it’s the human side I want to focus on — and in particular, the role of dopamine.

TD learning and dopamine function in the brain are intimately linked. Research has shown that when presented with a stimulus that triggers an unexpected reward, our brains will initially release dopamine when presented with the reward, in response to the difference between what we were expecting (nothing), and what we got (something pleasant). But what happens over time is that the dopamine cells start to fire when presented with the stimulus, and not when the reward itself shows up — if it ever does. What’s happening is that dopamine is released when we positively change our expectations of what we think is about to happen — we get a hit when we think things are going to turn out better than expected.

This has some really interesting implications for brands.

Firstly, for brands operating in categories with low expectations (the kind of territory where the DTC brands previously mentioned are often found lurking), exceeding those expectations is not only not hard (conceptually, if not technically), but it’s an effective way to build brand preference at a neurophysiological level. Think of it as a kind of “hedonic capitalism”, in which each transaction that was somehow simpler and more enjoyable than expected — booking a vacation, watching an ad for an insurance company, managing a thermostat — comes with a free dopamine cookie. After a while you come to believe that making things better is simply what this brand does.

Once you’ve been “trained” to know what “better” looks like, it’s not the attainment of the reward that elicits the dopamine response — it’s the expectation of the reward, triggered by a sensory cue. For brands, these cues are analogous (but not exclusively so) to Byron Sharp’s “memory structures”. If brand x has done an effective job of exceeding people’s expectations and making sure that everyone knows about it, when you see their logo, colour palette, or mascot, hear their jingle or sound, or a combination of distinctive brand elements, your brain is going to assume that your state of affairs is about to improve. Research has shown that whether they actually do improve, has no bearing on whether you get that hit or not — at least at first. And while in those experiments, the subjects themselves needed to have first hand knowledge of the reward, that’s not the case in brand building, which is a social process as much as it is a personal one. We’ve reached a point at which the opinions of others are just as — if not more — valid than our own when it comes to the value that we think a brand or its product might bring us. If you needed to have first-hand experience of using a brand’s products to form an expectation of it, Bernard Arnault probably wouldn’t be the world’s richest person.

This is why the most loveable, or at least most charismatic and culturally penetrating versions of brands are formed during the hedonic upswing of their heterostatic phases — dopamine-driven hit makers smashing the status quo. This is as true in the world of art as it is in the world of commerce. The notion of the “imperial phase” in music — as coined by Neil Tennant of the Pet Shop Boys — is a neat analogy. It describes the period in which an artist is at the top of their game, creatively and commercially, and can be defined — as Tom Ewing did — by “command, permission, and self-definition”. ‘Command’, in the sense of being in the zone, with complete mastery of the mode and medium. ‘Permission’, in the sense of genuine hunger from the public in what you are doing and desire for you to push boundaries. ‘Self-definition’, in that everything you do as an artist after this phase will be measured against it. It’s this last factor that creates the catch-22. As Tom Ewing puts it, “An imperial phase sustains a career but also freezes it: Empires decline and the memory of former glories dies hard.” Having an “imperial phase” as a brand is hard enough as it is. Sustaining it is quite another.

Henry Ford’s assertion that, “You can’t build a reputation on what you are going to do” may be true initially, but in the long term, your brand becomes a reflection of what people expect you are going to do. If nobody has any expectations of you, can you be said to have a brand at all?

4. YOU’VE LOST THAT LOVING FEELING

It’s your brand and what it promises that makes people feel good — call it an “aura” if you’re so inclined. Whether you deliver on that isn’t irrelevant, but that aura certainly does give strong brands a kind of reputational cushion that allows them to coast, and even to fail, that weaker brands don’t have. You could argue that Apple has been coasting ever since the death of Steve Jobs. Since the release of the iPhone and the Apple Store, they have not released any products that have meaningful and surprisingly changed our expectations of how things should be. Not that they’re exactly struggling — it’s also entirely possible that it has been Tim Cook’s masterplan to wean us off of an unsustainable disruption habit all along.

You could also argue that while Meta has made a serious attempt to change our expectations of how the world should work with its foray into the metaverse, but it’s concerted campaign to fritter away any goodwill it might once have had as a brand means that the only sound echoing around the desolate halls of Horizon World is the merciless derision of people that probably once kind of liked Facebook but now revel in its legless flailing. What Meta wants people to do is falling further out of alignment with what an increasing number of people (particularly younger ones) want to do. The brand space they’ve created is defined by artifacts with a dwindling value function that have left people feeling confused, demeaned, and bored — not the hallmarks of a successful curriculum.

No aura is indestructible, no matter how powerful or ubiquitous your brand. Having put what felt like the sum total of human knowledge at our fingertips, it once felt like every interaction with Google would make us superhumanly smarter. But as its game-changingly spartan interface loaded up with visual detritus that has nothing to do with a better user experience and everything to do with the business model of “enshittification”, our brains slowly rewired. As Cory Doctorow points out, “Today’s Google results are an increasingly useless morass of self-preferencing links to its own products, ads for products that aren’t good enough to float to the top of the list on its own, and parasitic SEO junk piggybacking on the former.”

Google’s iconic primary-coloured logo once elicited a brain tingle of limitless possibility. Access to everything, everywhere, all at once. Now a diminished value function makes it feel like whatever’s about to happen next is going to be a slog. The curriculum is broken. Our dopamine cells are barely twitching, let alone exploding with excitement. How long before our policy changes — especially now that a sexy new paradigm shifter has come along in the shape of ChatGPT?

But even if you are able to continually knock it out of the park of expectations with every campaign, product release, or customer service call, there’s the hedonic treadmill effect, i.e. you can’t keep exceeding people’s expectations forever, because we humans have a stubborn tendency to return to an emotional baseline, no matter what happens to us, positive (winning the lottery) or negative (losing your legs in a car crash). As Brian Christian puts it, “If happiness comes not from things having gone well, not from things being about to go well, but from things going better than expected, then yes, for better or worse, as long as our expectations keep tuning themselves to reality, then a long-term state of being pleasantly surprised should be simply unsustainable.”

This might sound like an invitation to keep expectations low. The problem with that as a workaround is that there’s a distinct correlation between low expectations and a negative sense of well-being. As a brand, you don’t want to train people to think that you suck, just so that you can occasionally surprise them by sucking less than expected.

5. YOU ARE ALWAYS ON MY MIND

I hope the big takeaway here is not that humans are simply robots with credit cards, nor that we ought to treat them as such. On the contrary, it’s the insights we can derive from machine learning — and reinforcement learning in particular — that can help us think about how to influence people’s expectations from brands. And that is key — a brand is the accumulation of expectations that someone has of how it will behave or the value it will provide.

In the words of Jeremy Bullmore, “People build brands as birds build nests, from the scraps and straws they chance upon.” By thinking of this act of bricolage as a curriculum — a cohesive set of cumulative non-linear, multi-channel interactions — we can shape that process and make it more intentional.

There are lots of ways to build a curriculum, whether you’re doing it for a group of students or doing it for a brand. But whatever your philosophy, there’s one thing that holds true: it’s not enough to get people’s attention. It’s what you do with it that counts. Positive learning experiences are the result of positive interactions. My son doesn’t learn by being poked and barked at by his teachers, nor does he learn anything from them mumbling into space as he’s pinging past. But so much of the consumer experience — whether its brand or direct response advertising, identity design, ecomm flows — does exactly one of those things. And when he needs help, if he’s ignored, condescended to, inconvenienced, confused, or bored, he’s not going back to that person. It’s the same for us with brands — unless the alternatives are even worse.

This isn’t to say that every interaction deserves equal weight. That would be unnecessary, impractical, and exhausting for its makers and consumers. But there is no such thing as a neutral interaction. Even something as anodyne as a humble banner ad that barely registers in our consciousness teaches us that the brand that created it doesn’t believe itself worthy of our attention.

By exceeding people’s expectations, we can tap into the part of the brain that creates joy. Not the synthetic version that perches limply atop of corporate brand frameworks. I mean the real organic shit. Pure dopamine that courses through your veins and makes you believe, if only for a beautiful moment, that a brand might somehow, against the odds, actually make the world a better place. And who doesn’t want that?

REFERENCES

Thierry Brunfaut and Tom Greenwood, Blanding — The hottest branding trend of the year is also the worst, Fast Company (Dec 2018)
Brian Christian, The Alignment Problem (2020)
Will Dabney, Zeb Kurth-Nelson, Dopamine and temporal difference learning: A fruitful relationship between neuroscience and AI, Deepmind (Jan 2020)
Cory Doctorow, The Enshittification of Tiktok, Wired, (Jan 2023)
Niall Firth, Language models might be able to self-correct biases — if you ask them, MIT Technology Review (Mar 2023)
Alex Kantrowitz, The direct-to-consumer craze is slamming into reality, CNBC (March 2023)
Dan Lee, Reinforcement Learning, Part 1: A Brief Introduction, Medium (Oct 2019)
Paul Murray, Who Is Still Inside the Metaverse? Searching for friends in Mark Zuckerberg’s deserted fantasyland, New York Magazine (Mar 15, 2023)
Marty Neumeier, The Brand Gap (2006)
Fred Niclaus, DTC’s terrible, horrible, no good, very bad year, Business of Home (Dec 2022)
Kevin Roose, A Conversation With Bing’s Chatbot Left Me Deeply Unsettled, New York Times (Feb 2023)
Robb B. Rutledge, Nikolina Skandali, Peter Dayan, and Raymond J. Dolan, A computational and neural model of momentary subjective well-being (2014)
Ben Schott, Generation Z, You’re Adorkable, Bloomberg (Jan 2021)
Wolfram Shultz, Peter Dayan, and P. R. Montague, A Neural Substrate of Prediction and Reward (1997)
Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction (2014, 2015)
Nitasha Tiku, The Google engineer who thinks the company’s AI has come to life, Washington Post (Jun 2022)
James Vincent, Twitter taught Microsoft’s AI chatbot to be a racist asshole in less than a day, The Verge (Mar 2016)

This is Your Brain on Brands

Written by Martin Heaton