How to Choose a Hero

Entropy, Information and the Importance of Pattern

Sean McClure

Published in

NonTrivial

20 min readNov 3, 2023

2 Kinds of Heroes

A hero is someone who is admired or idealized for their courage, outstanding achievements or noble deeds. But there is a distinction between the way society portrays heroes, and it comes down to what I consider the 2 core types of hero in society: a hero is either presented as an extraordinary individual in ordinary circumstances (E-O hero), or an ordinary individual in extraordinary circumstances (O-E hero).

For example, we often see this distinction play out in television series. The main character either has an eccentric personality doing regular things (e.g. a lawyer, a doctor, a hacker who is extra quirky or intelligent living in a familiar world) or is someone average going up against an incredible situation (e.g. a school kid, a teacher, a ‘nobody’ changing the world). In both cases the individual is admired or idealized, but they represent inverse relationships between individual and environment.

Of course there are 4 possibilities in this framing, as shown in Figure 1:

But I argue that only E-O heroes and O-E heroes is what we tend to see, because both O-O heroes and E-E heroes are boring. People need to see contrast.

The Problem with Extraordinary People in Ordinary Circumstances

The extra quirky or intelligent hero is someone many people want to identity with. It might be how they wish to be viewed. But in reality, none of us have such exaggerated personalities. The problem with portraying extraordinary people in ordinary circumstances is nobody can relate to them.

Imagine a socially awkward individual on a sitcom who doesn’t understand sarcasm, but is portrayed as extremely intelligent (E-O hero). Some people might view this awkward hero as someone they wish to be like. While wanting to look nerdy might sound odd, consider that society associates socially awkward behavior with higher intelligence. People might choose to “relate” to this hero because, despite their social ineptitude, the hero possesses some quality they themselves wish to be known for.

Or imagine a television doctor portrayed as a brilliant diagnostician (E-O hero), but who also has a caustic and overbearing personality. He ignores ethical standards and often applies unconventional and morally ambiguous methods. His lack of empathy and disregard for authority leads to him being manipulative and deceiving. Despite his blatant and exaggerated downsides many people will still admire him, because they believe his poor behavior is a social cue for a trait they wish to be known for.

To be clear, I’m not saying real people can’t be socially awkward or have toxic personalities, but they are never at the level portrayed in television and movies.

Why would anyone bother relating to someone with obvious downsides?

Because the hero’s exaggerated behavior (socially awkward, obnoxious, absent-minded, etc.) sends a strong signal to others that they are a certain type of person. Even if you really are “smart” or “strong” or “determined” how would anyone know, unless there is a signal that says as much without you spending the time to demonstrate such qualities. Rather than putting in the legwork to genuinely be astute, it’s easier to just be known for smartness by virtue of your odd personality.

Extraordinary heroes don’t always get portrayed with downsides. A superhero with amazing physical strength might inspire some, but again, it’s far beyond any physical power we would posses ourselves. And it’s not all fiction. Both historical and living “legends” are the subject of admiration. History does not speak of inventors or leaders as regular people, but rather as “geniuses” who accomplished extraordinary things. Today’s top CEOs or Hollywood celebrities are viewed (or sold) in the same fashion.

In all these cases, the heroes are portrayed, not as normal people, but as extraordinary individuals operating in environments we are familiar with.

I’m not a psychologist, so my theory on people’s motives isn’t the point. The real point is people will find ways to admire or idealize individuals who in reality they cannot relate to. There’s even an evolutionary argument to be made, since using signals is how we assess people and situations without having the know all the details.

But the reality is we cannot relate to extraordinary people because they aren’t real. On the fictional side, the reason is either obvious (we don’t have superhuman strength, cannot fly, etc.) or based on some projected insecurity (people wanting to be seen as extra-unique, smart, etc.). And on the real-people side, the exaggerated personalities and stories we hear about inventors, scientists, actors, CEOs etc. are little more than survivorship bias. We concentrate on the people who became well known and overlook those who did not. If we looked at those who did not become well-known, we would realize the famous are quite ordinary after all. As I like to say, winners and losers have the same stories; not in their details, but in their capacity to control the outcome.

We cannot relate to extraordinary people in ordinary circumstances because they don’t exist. Nobody acts like that. And nobody has all the ideas or determination to go it alone. The E-O hero is both highly ubiquitous, and wholly unrelatable.

Heroes Should be Ordinary People in Extraordinary Circumstances

Ordinary people are like you and me. Ordinary circumstances are situations we face every day. Extraordinary people are those portrayed as having exaggerated skills or intelligence. Extraordinary circumstances are rare situations or events that deviate significantly from our usual experience.

I argue that only ordinary people in extraordinary circumstances (O-E heroes) represent the best kind of hero, because they sit in the most realistic and informative quadrant:

**Figure 2** A “hero quadrant” showcasing the 4 possible combinations of ordinary and extraordinary (people and environments).

Not only can we relate to ordinary people, we can also relate to extraordinary circumstances since, although rare, these do happen. There are natural disasters, political unrest, market crashes, pandemics, lay offs, illness and and termination of long-term relationships.

Heroes should be ordinary people in extraordinary circumstances because they represent people who have traits we ourselves might posses AND those traits are tested in the best way possible; under extreme circumstances.

Compare Superman to Batman. Superman (E-O hero) has superhuman strength, can fly as “fast as a speeding bullet” and has x-ray vision. He uses these powers to save people from natural disasters, rescue people from accidents, deliver victims from burning buildings, prevent suicides and even perform emergency surgery. Batman (O-E hero) takes quite a different approach. Batman uses gadgets, relies on detective skills, martial arts and strategic planning. Both Superman and Batman are going up against extraordinary (but still familiar) circumstances, but only Batman is someone we can reasonably relate to.

The other quadrants don’t measure up. The ordinary person in an ordinary environment (top left) and the extraordinary person in the extraordinary environment (bottom right) are not being tested rigorously enough for it to be interesting. The same holds for the extraordinary person in ordinary circumstances (top right).

Heroes should be ordinary people in extraordinary circumstances (bottom left) because the most important part we need to relate to is the individual, not the circumstance. We can take-on character traits. We can implement strategies. We cannot choose our circumstances.

But what do I mean by “interesting”? Am I only talking about being entertained? No. The real purpose of having a hero is to learn. We need to see what is possible. What matters is the information content we can glean from witnessing and admiring an individual go up against difficult odds.

Let’s add some rigor to this discussion via computer simulation and quantification.

Disorder, Information and Pattern

The reason ordinary people in extraordinary circumstances makes for a better hero is because onlookers can perceive better information about dealing with situations.

To first anchor this situation on intuition we can create an agent-based simulation. This allows us to consider entities (agents) that are programmed to interact with each other and their environment, according to specific rules.

In this simulation, I will specify a blue agent, a red agent and a number of obstacles in the environment. The blue agent exhibits strategic movement by responding to obstacles, adjusting its direction in order to avoid colliding with obstacles. The red agent exhibits “teleportation” behavior whenever it approaches an obstacle, thus moving in a straight line until it encounters an obstacle, at which point it teleports to a new random location.

The blue ball represents us; a normal person (O-E hero). The red ball represents someone we cannot relate to (E-O hero).

The following shows the simulation in action:

**Figure 3** Agent-based simulation of an “ordinary” agent and a “superhero” agent.

Notice the blue ball (O-E) navigates around the obstacles. We can think of this as someone who negotiates, uses their powers of reason, or strategy to resolve a difficult situation. The red ball (E-O) uses their omnipotent powers to simply teleport to a different location, away from danger. A real-world analogy might be a rich person with near-unlimited resources to “deal” with situations by passing on risk to someone else.

Importantly, notice the difference in paths that are drawn out by our agents. The blue ball’s path is far more predictable and less disordered, compared to the haphazard path left by the red ball’s teleportation.

This difference between the amount of disorder between the paths allows us to quantify the amount of information revealed by each agent, by pairing the simulation with Shannon entropy. The intuition is that a more disordered line will have higher entropy while a less disordered (more predictable) line will have lower entropy. The Shannon entropy relates to the average amount of information, uncertainty, or surprise associated with a random variable or a probability distribution. Thus the more surprise the higher the information content.

We can calculate the entropy of the paths our agents create during their struggle against obstacles by analyzing the distribution of line segment lengths. This will allow us to quantify the information content of the paths. In this context, we are applying the notion of uncertainty and disorder to a type of path analysis.

To calculate the entropy of our agent’s paths, we can use the following approach:

Segmentation: divide each path into segments (straight-line segments between consecutive bends);
Length Calculation: use Euclidian distance to calculate a segment’s length (since we’re using a 2D simulation):

Distribution Analysis: create a histogram of path segment lengths (count how many segments fall into each bin);
Entropy Calculation: use the histogram to calculate the entropy using Shannon entropy:

Here, pᵢ is the probability of a segment falling into the ith bin (calculated by dividing the count of segments in the ith bin by the total number of segments).

Comparison: compare the entropies of the 2 paths (from each agent) by comparing their entropy values. The path with higher entropy means it has more uncertainty and randomness (higher information content), while the lower entropy path is more predictable / less random (less information content).

Here is the simulation running with histograms populated in real time, along with the ongoing calculation of path entropy (check out the app here):

**Figure 4** Regular versus “omnipotent” entropy using path analysis to calculate the Shannon entropies of a strategic agent versus an agent with super powers.

Watch from the beginning to see the build-up of entropy for both paths. Notice the “regular entropy” is lower than that of the “omnipotent entropy” since there is more variation in line segment lengths with the red ball.

We can also notice the entropy visually by perceiving the different levels of disorder in the paths as they are drawn out in the simulation.

But wait a second. If the red ball (superhero) has higher entropy, and entropy means more information content, doesn’t that make the superhero more informative, and thus a better choice for those looking to learn?

This actually touches on a deeper issue at the heart of relating entropy to information, to which we turn now.

More Randomness Cannot Mean More Information

The problem with thinking of information content in terms of surprisal and disorder is that it suggests that a system with more randomness has more information. Taking this argument to the extreme would mean pure randomness has the most information, which obviously cannot be true.

Yes, information is valuable/informative when it’s something we didn’t see coming, and randomness adds to the “noise” that makes it harder to uncover what’s important. But past some point more randomness does not conceal more information because what’s informative is pattern.

I discussed the problem with both Shannon entropy and Kolmogorov (algorithmic) complexity with respect to valuable/useful information here. The take-home message is that both Shannon entropy and algorithmic entropy work in terms of raw transactions; like a machine talking to a machine. If a machine must encode a string of text, it will be easier to do so if there is less uncertainty in the message. If a machine must compress a string of text it will be easier to do so if the message is less disordered. So to a machine, the more disorder the more information (more to encode, more to compress).

But people consider something informative when they find meaning, and meaning comes from pattern. Adding disorder to a string of text doesn’t necessarily mean there’s a meaningful message just waiting to be discovered. Humans think of something as informative when they can detect a pattern. Pattern recognition still involves both surprisal and compression (we are surprised to see the pattern and a pattern is an abstraction in the mind, hence compression), but real-world patterns exist somewhere between total order and complete randomness. A totally ordered system bears no surprise, so contains little to no information. A completely random system also has no surprise because there’s nothing to find. Compressing a totally ordered system is trivial, and a completely random system cannot be compressed.

In other words, surprisal and compression alone don’t cut it when it comes to a working definition of information. Information must be based on pattern.

The most straightforward approach to pattern recognition is visual inspection. Humans are excellent at this task for evolutionary reasons. Looking at Figures 3 and 4 we see the blue ball leaves a more predictable path following a clear pattern, whereas the red ball is unpredictable and erratic.

More advanced methods of detecting a pattern within a path might involve statistical methods such as autocorrelation analysis or Fourier analysis. Looking for periodicities in the path can help reveal regularities in the underlying data. Machine learning could be used, such as training a neural network. Using features extracted from the paths machine learning might learn to differentiate between predictable and erratic patterns. Path Analysis and Signal Processing might help by removing noise to make the pattern more apparent. Time series analysis could identify underlying trends, seasonal patterns, and irregularities. Spectral analysis might identify dominant frequencies in path data; if the predictable path has a dominant frequency it could mean there’s a pattern.

Which method to use depends heavily on the nature of the situation.

Using one of the above approaches is beyond the scope of this article, but also unnecessary. Looking at either of Figures 3 or 4 we can easily see that the blue agent leaves a far more predictable path. More to the point, we can clearly see the strategy the blue agent uses. The ordinary person reveals information we can employ ourselves, as our own strategy or general approach to tackling tough situations.

So, rather than looking at raw entropy values between the two paths, we instead look for the best pattern as the most informative. But recall that we set out to be rigorous. We still need something we can measure, and entropy seems like the best way to do this for “information.”

Entropy is indeed a measure of information, but we need to counterbalance its value with the reality that pattern is what matters.

Information Ratio

I set out in this section to add some rigor to my argument that the best hero is an ordinary person in an extraordinary environment (O-E hero). We know that entropy can help us quantify and compare the amount of information between an ordinary and extraordinary person, as done using path analysis and Shannon entropy. But we also know that entropy alone doesn’t satisfy our notion of useful information. Sure, we can visualize the difference in usefulness between paths, but this raises the question: what is it that humans are perceiving? Why do we call the lower entropy path a pattern (and thus more informative)?

I believe the answer is related to how we take into account the environment. If we only saw the blue ball’s path and not the obstacles we might consider its movement as random as the red ball’s. But seeing the environment allows us to make the critical distinction between the 2 paths. We can see what the blue ball is doing AND we can understand that it’s real (does not violate our innate sense of physics).

We can still use the convenience of entropy to produce a well-defined notion (and calculation) of information, as long as we include the entropy of the environment. Only then can we assess how truly informative a hero is.

I propose a “hero index”, which takes the ratio of the information content of the hero to the information content of their environment. Keep in mind that I am using information content in terms of Shannon entropy, which is a measure of average surprise or uncertainty for a probability distribution. High entropy means high uncertainty/surprise/information on average, and low entropy means low uncertainty/surprise/information on average.

The hero index will be higher for heroes that are more relatable, because relatable people have less surprise (we recognize their traits). But less surprise in the hero is not a hindrance to learning as long as the environment is high in entropy. And that’s exactly where the high uncertainty should be; a genuine battle ground that tests characteristics we can relate to.

The ratio expressed in the hero index captures the bottom-left situation in the hero quadrant:

**Figure 5** Amount of information (qualitative) in the “hero quadrant.”

We already know how to calculate the hero entropy (equations 1 and 2 above). To calculate the environment entropy we need to translate the environment into a measure of disorder.

For our simulation we can arrive at an actual number for its environmental entropy by using the simplified constraints of the system to calculate the number of possible configurations. The constraints of our system are:

there are 50 obstacles;
all obstacles are the same size (radius 10px);
obstacles cannot overlap;
the “world” is exactly 500px by 500px in dimensions.

We can find how many possible configurations our obstacles can have by dividing our “world” into cells, such that each cell can only contain one obstacle at a time. Imagine having 3 eggs (“obstacles”) in a carton (“world”) that holds 12, and seeing how many different ways you can arrange the 3 eggs.

With a world size of 500px by 500px, and an obstacle radius of 10px (diameter 20px), and knowing that each cell must be 20px by 20px to accommodate one obstacle should it land there, the total number of cells in our world is 625:

So we have a 2D square grid with 625 cells, and we want to calculate the number of different arrangements of 50 obstacles, with each cell allowed to have either 1 ball or no ball (binary choice).

**Figure 6** Dividing the simulation environment into cells in order to calculate the total number of possible configurations of obstacles (“environment entropy”).

This is a combinatorial problem where we must find the total number of binary sequences of length 625 that contain exactly 50 ones (obstacles) and 575 zeros (empty cells).

To calculate this, we can use the binomial coefficient formula, which represents the number of ways to choose k objects from a set of n distinct objects. In this case, we want to choose 50 out of 625:

Where:
- C(n, k) is the number of combinations.
- n is the total number of cells (625 in our case).
- k is the number of obstacles (50 in our case).

Let’s plug-in our simulation numbers:

C(625, 50) = 625! / (50! * (625–50)!)

Giving,

This is the total number of valid combinations. It is the number of different arrangements of 50 obstacles in a 2D square grid with 625 cells, where each cell can have either 1 obstacle or no obstacle.

This is an astronomically large number. Let’s use Python to find out what this number is in scientific notation:

import math

# Calculate the binomial coefficient
binomial_coefficient = math.comb(625, 50)

# Express the result in scientific notation
scientific_notation = "{:e}".format(binomial_coefficient)

print(scientific_notation)

This gives 2.730878e+74 (2.7 followed by 74 zeros).

Now, to calculate the Shannon entropy for our environment we can use the formula for entropy given in 2) above, rewritten here with the x variable shown:

P(xᵢ) represents the probability of observing a particular configuration xᵢ out of the total n possible configurations.

Since we have 2.7 x 10⁷⁴ total configurations, and each configuration is equally likely because there is no specific arrangement favoured, the probability P(xᵢ) for each configuration xᵢ is:

Note: If each configuration was not equally likely then the non-uniform probability would yield less uncertainty and thus less entropy; see here.

So, our environment entropy is:

This formula represents the sum of the probabilities of each outcome multiplied by the logarithm of the reciprocal of the probability, all taken with base 2.

To calculate this sum, we can simplify the expression inside the logarithm:

So, the expression for entropy simplifies to:

And now we can calculate the sum:

This means the environmental entropy for our world of obstacles is 2.7 x 10⁷⁶ bits. We can think of this value as the average minimum number of bits needed to encode the arrangement of the obstacles in our world.

Finally, we can calculate the “hero index” using the 2 values of entropy. Here is the ordinary hero:

…and for a superhero:

Note: The values for regular and omnipotent entropies depend on the runtime of the simulation, but are not expected to change in terms of relative magnitude (“omnipotent” is always higher than “regular”).

Even though both numbers have the same power of 10, the difference in their coefficients suggests a substantial disparity in their values when discussed in the context of useful information.

For example, in terms of information theory, small differences in coefficients can lead to significant distinctions in the amount of information conveyed by different systems. In terms of thermodynamics small differences in coefficients can indicate differences in the organization or structure of systems, leading to disparate physical behaviors. In terms of statistics and data analysis small differences in coefficients can imply distinctions in the shape or spread of probability distributions, influencing decision making and inference.

Consider comparing the compression of 2 documents, each in a different language, using some compression algorithm. For example, the Huffman coding algorithm assigns shorter codes to more frequent symbols. Since different languages might have different letter frequencies (e.g. the letter “e” is much more common in English than in many other languages), comparing the compression of these 2 documents would lead to distinct (and meaningful) differences. Similarly, comparing a document about scientific research to a general news article (about the same topic) would lead to different compression rates (encoding jargon versus encoding everyday language). Small differences in the code lengths for a given character can accumulate, leading to a major differences in the overall compressed sizes of the two documents.

The point is, even subtle differences in coefficients can lead to significant distinctions in the amount of information conveyed by different systems.

Back to our hero-index, we can see the ordinary hero in an extraordinary environment (O-E hero) scores the highest, compared to all other hero types, as depicted in Figure 5.

Again, the interpretation here is that observing an ordinary person solving problems in an extreme environment is both relatable and highly informative. The hero index takes advantage of Shannon’s rigorous definition of information, but also accounts for pattern by counterbalancing this value with the entropy of the environment.

Choosing Heroes

Of course in the real world we can’t be expected to calculate entropy numbers for our heroes, let alone the environments they operate in. But doing a calculation for our simplified simulation, and its convenient constraints, gave us a nice anchor to understand the critical concepts.

In our own lives we can apply the hero index as a conceptual heuristic. We should be choosing heroes who are both relatable and who face extreme situations. The hero index suggests that it is these heroes who provide us with more than just fleeting inspiration; they provide us the best opportunities to learn things we can use.

When it comes to real people, historical or alive, we need avoid the survivorship bias. We need to not look for “geniuses” or “legends” and instead focus on those who were/are quite ordinary, yet operate in extreme (but familiar) situations.

There are numerous accounts of Mahatma Gandhi being a remarkably relatable person. He lived a simple and modest life, had empathy for the poor and downtrodden, he encouraged regular people to meet him and discuss their issues. Gandhi made mistakes and had his share of failures (e.g. underestimating the readiness of the Indian masses for nonviolence in the Non-Cooperation Movement). And yet Gandhi played a pivotal role in India’s independence movement. Undoubtedly an extreme situation.

Oskar Schindler was a German businessman and a member of the Nazi Party. Oskar was an ordinary industrialist seeking to profit from the war by using Jewish labor in his factories. And yet, after becoming horrified by the Nazi atrocities he risked his life and fortune to protect Jewish workers. Oskar saved the lives of more than 1,200 Jews during the Holocaust. An extreme situation to say the least.

Rosa Parks was a housekeeper and seamstress. But in her activism she refused to give up her bus seat to a white passenger, despite the extreme segregationist laws at the time. She had Ku Klux Klan members marching down her street while her grandfather guarded the front door with a shotgun. Extreme.

Gandhi had modesty, empathy and self-effacement. Schindler had compassion, resourcefulness and courage. Parks had determination, resilience and profound dignity. All relatable and achievable traits, steeped within a world that actively worked in opposition to such qualities.

On the fictional side, we already saw how Batman makes for a more informative hero than Superman. And there are many others. You have to choose who you believe is worth relating to, but they must be relatable. Choosing eccentric personalities or “geniuses” means you’re buying into false narratives about how people accomplish things, and for some of you, it may be rooted in your desire to be seen a certain way.

Fiction can be even better than real-world accounts. Beyond being less susceptible to survivorship bias, fiction can exaggerate the environment to better contrast ordinary traits and strategies from the world they operate in. As I stated previously, we can all take-on character traits and implement strategies. But we cannot choose our circumstances. Better to see what we can control, highlighted by what fiction does best; creating worlds beyond what we normally experience.

A hero is someone who is admired for their outstanding achievements. It’s good to be inspired, and even better to learn how to deal with real world situations. But there is a difference between the way society portrays heroes. We can look to those who are portrayed as extraordinary people in ordinary situations, or ordinary people in extraordinary situations.

Only the latter reveals the kind of information that goes beyond entertainment; teaching us the most important lesson; that all of us can achieve truly great things.