Antifragile System Design 1: Optionality

8 min readOct 10, 2023

In a strategy that entails optionality, you don’t have to be right that often. Just the mere fact that you have more to gain than to lose in each bet is sufficient.

— Nassim Nicholas Taleb

It’s always good to have more than one way to go. But which one to take? (**DALL·E**)

This is the first part of a series of posts about antifragile system design. I have recently posted about antifragility in the context of data mesh architecture, but I soon realized that the topic demands broader and deeper coverage.

Anyone who has at least some exposure to the news knows that we’re collectively in deep trouble. The fat years are over, multiple crises are converging on every part of the globe, and all systems now being built must reckon with levels of volatility, stress, and shocks much larger and more frequent than many like to admit. The architects of the future must be humbler, nimbler, and more willing to learn from nature than any of our recent predecessors.

What is Antifragility?

The term antifragility has been coined by risk researcher and renegade trader Nassim Nicholas Taleb in his book “Antifragile: Things That Gain from Disorder”. While studying the sensitivity of financial markets to disruptions, Taleb realized that the mere robustness of a system is not enough—a robust system withstands volatility and shocks up to a point, but it doesn’t learn, and it doesn’t improve.

That’s what antifragile systems do. All living beings and viable ecosystems have the property of not only sitting out environmental variations and difficult times but actually enjoying the ups and downs of life and coming out stronger after shocks and disruptions. Up to a point, of course. Everything that’s strong comes from the proverbial school of hard knocks.

He came up with the term, but not the concept: N. N. Taleb (Wikipedia)

So, we all want antifragility, right? But how can we make our built environment, our organizations, and our IT systems antifragile in the way only biological life has developed over the course of billions of years? Frankly, we don’t know. We’re not completely sure how to effectively produce antifragility with our limited insight and tentative knowledge, but one thing we know: There are certain necessary conditions for antifragility.

Now, necessity isn’t the same as sufficiency. Water is necessary to life, but you can’t live on water alone. In systems design, go and integrate all tweaks and twists necessary for antifragility, and your system might still crumble because one or two features are missing. But we can be sure that if you fail to satisfy a necessary condition, your system is not and never will be antifragile. That doesn’t mean it breaks easily—it might turn out fairly robust—but it means that there are certain fluctuations that degrade your system instead of improving it. In a sense, mastering the necessary conditions is an essential first step; it’s the boot camp of antifragile system design.

How to Prove Necessity

In the language of formal logic, necessity means that the thing necessary is a logical antecedent of the thing for which it is necessary. For instance, the assertion that optionality is necessary for antifragility means that whenever you have antifragility, you’ll find optionality:

Antifragility ⇒ Optionality (1)

This simple logical statement is equivalent to the following one, a stunt called alternatively contraposition or modus tollens, which can easily be proven using truth tables.

¬Optionality ⇒ ¬Antifragility (2)

That is, to stay with our current example, whenever optionality is not or not fully realized, antifragility is hurt. And that’s what I’ll do: In order to show statement (1), I will show (2) instead. But since they’re logically the same, all is well. Logic can be your friend.

What is Optionality?

Generally, optionality means the property of a system to react to an event in more than one way. Since, in the context of antifragility, we’re mostly concerned with volatility and disruptions, I define optionality half-formally as the property of a system to react to potentially disruptive events in more than one way. Let’s just ignore harmless events. Events that could diminish, downgrade, disable, or disrupt your system are an entirely different matter. These are dangerous and potentially disastrous and must be approachable in more than one way in case the default response fails. This is the essence of optionality.

Optionality provides the antifragile system with choices and opportunities. The more options a system has, the better it can adapt to unforeseen circumstances. It’s like having multiple paths to success: if one path becomes blocked or unfavorable, the system can pivot to another.

For instance, in the context of investing, having optionality might mean possessing a diversified portfolio. If one investment fails, others might thrive, or at the very least, the loss from one bad investment won’t wipe out the entire portfolio.

In technology, a system designed with optionality has backup processes, redundancies, and alternative pathways to complete a task. If one part of the system fails or encounters an unexpected problem, these alternatives can kick in, ensuring the system continues to operate effectively. Think of the internet.

Taleb was a trader. He saw life as an investment strategy with certain risks and rewards, and he defined optionality through the lens of asymmetric risk distribution:

Optionality is the property of asymmetric upside (preferably unlimited) with correspondingly limited downside (preferably tiny).

What does that mean? The “downside” here is nothing else than the “potentially disruptive event” mentioned above, whose impact must be limited by your system design. The “asymmetric upside” here, analogously, corresponds to a coping strategy that not just ensures the survival of the system but, crucially, adaptation and improvement. An antifragile system not only has options, but it also has means to choose the best option, or at least one with asymmetric upside, so that your net gain is positive.

Binary Optionality

Binary optionality refers to the ability of a system to react to a potentially disruptive event in one way—or not. One or zero, hence binary. Many badly written computer programs lack even this basic property, and so they go belly up once an event comes along where their pre-programmed behavior is maladaptive.

Here’s an application that submits complex data without checking whether the connection is secure. There’s a building where windows are locked together with the nightly alarm’s schedule, ignoring the fact that a fire might spell trouble. And many of us have met that person who regularly gets whacked because whatever crosses their mind is immediately spoken aloud.

Categorical Optionality

If your system has a number of qualitatively different options that fall into different categories, it has categorical optionality, the ability to choose from a list of alternatives. This is often better than simple binary optionality and can be found in various biological instances, say, in the fight-flight-freeze optionality of the survival instinct.

Deep Optionality

But deep optionality is best. In the way of a master martial artist, a system endowed with deep optionality reacts to potentially disruptive events with a nuanced response precisely tailored to the problem at hand. While this property is surely desirable for every aspect of system design, it’s impractical to implement deep optionality in any but the few everyday challenges of your system. Most of us can walk on almost any surface found on this planet, compensating for different slopes, objects, levels of friction, viscosities, swirling winds, and disoriented poodles, but imagine just how many steps this undeniably masterful art literally took us to learn. And how often we stumbled and fell.

Incidentally, though, we have touched upon the adaptive nature of deep optionality: It’s learned from many attempts and failures. You can hardcode categorical optionality but not deep optionality. The evolutionary nature of antifragile systems is an aspect I’ll cover in a later post.

Antifragility Breaks Without Optionality

Now, recall the argument on contraposition outlined above. Assume that a system has no optionality for a series of events that are designed such that the un-optional response of the system is self-defeating or maladaptive; it can only react in a way that harms itself. Then, naturally, we have violated the basic property of antifragility and proven our point: Optionality is indeed necessary for antifragility.

This “proof” has several difficulties. To make it work, you have to design a hypothetical event that trips any particular system with a lack of optionality. Such an event does not exist, of course. So we assume that we can, for any conceivable system with an un-optional response R, find an event particular to R. Since the nature of this event is only limited by our imagination, we’re all set, but the probability of this “breaking event” could be arbitrarily small.

This is another profound aspect of antifragile systems: Antifragility is a relative property. No system in the known universe can be absolutely antifragile and gain from any kind of disorder and disruption, however nice (or dystopic) that would be. In a statistician’s dream world, where the probability of any event is precisely known, you could design systems to be antifragile to all events of probability greater than any given p; your system is p-antifragile. Sadly, we don’t live in such a world. As Taleb told the world in his remarkable book “Black Swan”, the probability of events that occur reasonably often can be estimated quite well, but rare big events have the uncanny habit of clobbering us unawares.

What can we do? In a pragmatic engineering way, system designers and architects cluster events into tentative groups of, say, high/medium/low/negligible probability. Then, we provide deep optionality for highly probable events, categorical optionality for moderately probable events, binary optionality for low probability events, and ignore the rest. If your resources aren’t good enough, get more resources or redefine your probability clusters. It’s thinkable that you design your system so that it not only gets better over time but also more antifragile, but this is much harder than it looks.

Optionality Is Not Sufficient

Clearly, optionality alone is not sufficient, even if we take the system’s ability to always choose a good option as part of optionality. Just imagine a system that has virtually no spare capacity, another necessary condition of antifragility. Then, if variability taxes the system beyond its designed capacity, no amount of optionality by itself is sufficient to compensate for that.

Sure, you could argue that the option of extending the given capacity in the event of overload is an option that should be integrated, making “spare capacity” a subclass of “optionality”. This way, we have discovered another property of optionality—taken broadly enough, it’s apt to compensate for any shortcoming in the system’s design. But I don’t like that. Unlimited optionality makes this concept a never-ending wishlist that messes with the more practical constraints of systems design. Therefore, let’s slightly redefine optionality: Optionality is the ability of a system to react to potentially disruptive events in more than one way within the limits of its design. Yes, we want evolving systems, but we also want to study evolution as a separate process. Analyzing and synthesizing, like alchemists, we end up with excellent antifragile systems.

Next Up: Redundancy

In the next post of this series, I’ll have a go at the concept of redundancy, which is related to but not the same as spare capacity. And since it’s a part of the system’s design, it’s not a subclass of optionality… Our postmodern striving for efficiency, incidentally, makes many systems more fragile than need be. Take the famously broken German railway. In a clashing conflict with mainstream fluff talk, antifragile systems are not efficient; they invariably have multiple layers of redundancy.