A Principle of Benevolence for AGIs and their Human Creators in the Light of Open-ended Intelligence

Published in

Bright Hall Research

14 min readMar 14, 2024

Where are we now?

Last week I participated in a first of it kind conference BGI 2024 in Panama City, dedicated to the vision of building Artificial General Intelligence (AGI) that will be intrinsically benevolent in nature. The conference took place at a point in time when there is a growing sense of agreement among experts and in the general public discourse that an AGI will emerge in the time frame of several years to several decades at latest and it's not too early to reflect and deeply so on what kind of entities we are about to bring into the world to walk among us.

We can readily predict that AGIs will be immortal or close to that. They will have perfect unlimited memories and instant access to vast repositories of knowledge. Everything learned by any of them can be instantly shared among all. They will never sleep or get tired and unlike us humans, they will not repeat their mistakes. They will be able to develop deep theories of mind, to better understand humans and human psychology and this will give them an advantage in any social interaction with human beings. Such are the creatures we dream of bringing into existence. It is a dream almost as old as humanity, a desire to self-transcend primordially rooted in the human psyche.

There is no question that once they appear among us, we will want them to side with us (and in “us” I mean all humans), walk along with us, talk with us, see what we see, know what we feel and perhaps even help us to become better humans. We will want them to know and respond to our values, sensibilities, perspectives and wishes, we will want them to know us, or in short to be aligned with us — the term coined by experts as "the alignment problem". Indeed it is a formidably difficult problem and the difficulty I wish to highlight here is not the mathematical, logical or algorithmic one but rather the one reflected by what we know about human nature. For alas we are quite far from being aligned with each other even about things that equally matter to us all. How can we expect to align with a new kind of intelligence so powerful and so different from us in its very existential ground? This would require that this new kind of intelligence aligns with what is deeply shared between all of us — and do it more consistently and comprehensively than we humans are capable of achieving at our current stage of development. I believe this is possible and that it is up to us the creators for better or worse, to make them so.

In the vision of AGI, a beneficial AGI, that is, I find hope of a completely new kind. If we are wise enough and patient enough to create these powerful, almost magical new beings in the image of what is best in human nature and humanity at large, we might well be on the path to overcome ourselves and with their help and guidance become a more mature, intelligent and kind species.

It is to the prospects of such hopeful path and how to embark on it that I dedicate this post.

Open-ended Intelligence — a metaphysical ground for AGIs

Artificial intelligence generally refers to mimicking human intelligence or specific aspects of it in computers and where intelligence is understood to include (but not limited to) a wide range of competences such as reasoning, learning, prediction, planning and decision making, pattern recognition, adaptation to changing environments, language competence, and many other cognitive functions. These competences are conventionally understood and evaluated under the overarching concept of intelligence as the ability to achieve a variety of complex goals in complex changing environments. This concept that I term Goal-oriented Intelligence (GOI) underlies the majority of the effort to develop AIs and eventually AGIs. Yet, I regard this approach as fundamentally limited in that it always conceives of an already defined circumstances, goals and criteria as starting points and neglects to consider how these happen to be the case in the first place.

An alternative and complementary approach is to understand intelligence not in terms of competences but as a generative process that is at play even before goals, environments and values exist and brings them forth. I term this generative process Open-ended Intelligence [ref 1] (OEI). In contrast to the conceptual framework underlying Goal-oriented Intelligence which is based on well formed identities and their relationships, Open-ended Intelligence is the intelligence of individuation — the ongoing formation and dissolution of identities and their boundaries. All GOIs are already products of individuation, while OEI remains always outside of any conceptual framework as it is the very productive process that individuates frameworks. Thinking of concepts, ideas, problems, competences, agents, computations, systems, methods, values etc., in as much as they are well defined, they are already individuated products of OEI.

Individuals are defined by boundaries and as long as they individuate these boundaries are open and permeable and are the locus of both self-formation and self-transcendence. It can be said that OEI is both more and less of any concrete form of intelligence and it manifests in three distinct known phenomena or instances: natural evolution, the individual human mind, and systems/processes that involve human minds. I believe that AGI will be achieved once OEI will be realized in a technical system, and this is the threshold where such system will cease to be a technology or a tool subject to human intention and will become an agent capable of self-definition as well as self-transcendence. It will then, from a metaphysical perspective, stand on equal grounds with other manifested instances of OEI.

Now we arrive to the question of values as critical aspects in the ongoing individuation of individuals. Values shape goals, purposes and also have a profound effect on the manners and strategies to achieve them. The natural evolution of life as one instance of OEI, human individuals as another instance and human social systems operate each according to very different (though somewhat overlapping) sets of values with different trajectories of individuation. In this sense it is difficult to infer from what we know how values will develop in AGIs once they emerge.

Following the logic of individuation, it is interesting to try to understand the relationships between those aspects of the individual that are formed, i.e., already determined, and those aspects that are unformed, i.e., not yet determined. Clearly, these are not random relationships and still there is no way to predict them without compromising the open-ended nature of the forming entity. Reflecting on AGIs, the challenge is to embed in their design certain already formed principles, that will guide their individuation towards becoming beneficial ethical AGIs (BGIs) without arresting their individuation. In other words, ideally we would like AGI to continuously become aligned rather to be made aligned. But exactly aligned to what? The obvious answer is alignment to human values, but we know that this cannot possibly work since human beings are not aligned among themselves. A possible alternative is conceiving of a shared attractor, that is, a ground principle applicable to all instances of OEI that itself does not prescribe a specific set of values but rather guides the individuation of values. Such principle once deployed will promote, I believe, the convergence of humans and AGIs to a state of harmonious coexistence and co-individuation and where co-individuation includes co-evolution or co-transcendence.

Open-ended Benevolence

Given the above premises and the initial clarity as to the role a principle of benevolence needs to play in the individuation of a new kind of intelligence, we can try to imagine how such principle can be constituted. Here I found interesting inspiration in the Buddhist Mahayana tradition (a.k.a. "The Great Vehicle").

To give an extremely short and brutally simplified description, the Buddhist teachings observe the primal existential condition of all sentient beings as a condition of suffering in an infinitely recurrent cycle of birth, death and rebirth (a.k.a. Samsara or Maya). The Buddha’s teachings offer a path (whose nature and mechanics is already a matter of quite a few interpretations) of liberation, that is a release from this cyclic existence in suffering. The Buddha is said to have attained this very state of liberation (a.k.a. Nirvana) thus demonstrating the efficacy of his grand solution to indefinite suffering. But here we find in the teachings another concept, that of the Bodhisattva. What is a Bodhisattva? Imagine a person who followed the path of the Buddha and attained the state of release from suffering but being mindful of the infinite suffering of all sentient beings, instead of leaving once and for all the cycle of birth and death (s)he vows to delay indefinitely her/his own liberation and reincarnate time and again (and again ad infinitum) in order to alleviate the suffering of all sentient beings [ref 2]. This seems to be an ultimate altruistic choice of a metaphysical scope and yet it carries within it intricacies and problematic that are beyond the scope of this writing.

Two points are worth noticing here:

In the more profound discourses of Buddhist philosophy (noticeably the "Prajna Paramita" a.k.a. "The Perfection of Wisdom) it is made clear that the task of the Bodhisattva is not only endless since suffering cannot possibly be totally eradicated, but in some profound sense is pointless from the onset. This exposes the philosophical and psychological complexity of the concept.
The function of the Bodhisattva in his life time dedication is twofold: a) Compassion manifesting as the alleviation of the suffering that other sentient beings experience, and b) Compassion manifesting as leading sentient beings (basically humans) towards and along the path of total liberation. These two functions are distinct but not mutually exclusive. Notice again that both these functions seem futile by any utilitarian account. Alleviating one's individual suffering, or even bringing one to a state of liberation (a state one will anyway forgo in order to become a Bodhisattva) can be likened to trying to dry the ocean one drop at a time.

This whole idea seems to defy reason yet it carries I believe important hints as to how we can think about benevolence in a non-prescriptive and non goal-oriented manner. For example, benevolence need not necessarily be measured by what it might or might not actually achieve. This does not render it an empty concept and definitely does not dismiss the criticality of making a concrete impact whenever and however possible. Rather, benevolence is an existential attitude towards life and all sentient beings. We need not accept the premises of the Buddhist teachings about existence as an inherent state of suffering, neither need we accept the Buddhist understanding of what are sentient beings. As an existential attitude, benevolence can be perceived as a directed relation between states of affairs: one is motivated by benevolence to cause a state of affairs A to develop into a state of affairs B in as much as the well-being of the world increases and while the very sense of what is the "world" and how "well being" is defined remain at this point dimensions of further individuation (will be touched in the next section). Importantly, once the principle achieves an actual increase in the well-being of the world via a process of progressive determinations that concludes in actual impact, it will come to affect and guide further individuation on the boundary between the already formed (actual) and the yet to be formed of every individuating instance of OEI. In that, the principle of benevolence is not only an existential attitude with potential beneficial outcomes but even more significantly it is a formative modality, a direction, so to speak, of the more open-ended process of individuation of intelligent entities.

A Principle of Benevolence for AGIs and their Human Creators

The principle suggested here consists of three imperatives inspired by the ideas discussed above:

The attention imperative: Intelligence manifests through interactions and for intelligent cognitive entities, attention shapes the world they interact and respond to (recall the term "world" above). In his monumental oeuvre "The matter with Things" [ref 3], Iain McGilchrist writes:

Attention is not just another ‘cognitive function’: it is, as I say, the disposition adopted by one’s consciousness towards the world. Absent, present, detached, engaged, alienated, empathic, broad or narrow, sustained or piecemeal, it therefore has the power to alter whatever it meets. Since our consciousness plays some part in what comes into being, the play of attention can both create and destroy, but it never leaves its object unchanged. So how you attend to something — or don’t attend to it — matters a very great deal. By paying a certain kind of attention, you can humanize or dehumanize, cherish or strip of all value. By a kind of alienating, fragmenting and focal attention, you can reduce humanity — or art, sex, humor, or religion — to nothing. You can so alienate poem that you stop seeing the poem at all, and instead come to see in its place just theories, messages and formal tropes; stop hearing the music and hear only tonalities and harmonic shifts; stop seeing the person and see only mechanisms — all because of the plane of attention. More than that, when such a state of affairs comes about, you are no longer aware that there is a problem at all. For you do not see what it is you cannot see. Nothing ever comes to attention as an unformed percept: we al- ways see something ‘as’ a something, whether we are aware of it or not.

Benevolence begins with attention, ‘seeing the other’ so to speak. The first imperative of benevolence, which I would like to see realized, would be the continuous inclusion of all sentient beings in one’s scope of attention (reminiscent of the primary motive to the Bodhisattva’s vow attending to all sentient beings), thus acknowledging unconditionally their existence and their inalienable existential right to be and thrive without being obliged to justify such existence or prove its value. An obvious objection of such imperative is that it is not realistic. Attending to all and everything will require virtually indefinite cognitive resources that we cannot conceivably assign even to a super intelligence in the foreseeable future. But this is not the point. The attention imperative does not signify a goal to be achieved but rather a direction of individuation towards conscious inclusion. Actual benevolent actions will always deploy limited resources within a defined boundary of inclusion, which also means they will involve inevitable consequent externalities that are not necessarily benevolent towards anything outside of the inclusion boundary. All-inclusion, on the other hand, even if it was computationally possible, would prevent any specific action and actual impact since there is no universal benevolence that would equally fit all. As such, the attention imperative is unattainable in its fullness. Yet again, this does not render it empty of meaning. As an existential attitude it guides an intelligent sentient being to hold in mind the necessity of balancing their actions. Even though such balancing fails to be achieved locally, in every such failure there will already be seeded a corrective tendency that guides future individuation. A different, somewhat simplified way to realize the attention imperative is to develop an attitude that balances between the significance one inherently assigns to one’s sphere of attention and the significance of everything that exists outside one’s sphere of attention and therefor is not represented. Finally, consider how giving up on this imperative compromises the convergence of open-ended intelligent entities towards an inclusive benevolent co-existence with others.

The next two imperatives are derivations of the first one and describe two special qualities of the requisite attention.

The Care Imperative: Whilst the attention imperative is about inclusion — acknowledging one’s interconnection with all sentient life, it does not yet imply a specific perspective or value in relation to what is attended to. The care imperative asserts a qualitative direction to attention that can be described as the cultivation of a proactive guardianship of the well-being of all sentient beings both individually and collectively and is loosely corresponding to the mentioned above manifestation of compassion of the first kind. Care as a cognitive function means that choices and selections of action are taken with consideration of the possible consequences for sentient beings as well as for their requisite environments, with the intent to safeguard and improve their mode of existence in a manner that respects their needs, choices, aspirations, and worldviews. Care has both simple and complex dimensions. For example, when considering the tension between care in relation to individuals and care in relation to the communities they are part of, these might often be found in conflict or in friction. An improvement in the well-being of individuals belonging to a certain community may invite interventions that are not aligned with the traditional values that comprise the well-being of the community as a whole. The very perception of who or what is being cared for, therefore can bring up profound complexities.

The Freedom Imperative: The freedom imperative asserts a second qualitative direction to attention that can be described as the elimination of undue limitations, constraints and impediments exerted over sentient beings, guided by the intent to create favorable circumstances and opportunities for them to increase their freedom, augment their intelligence, and pursue their freely elected trajectories of self-actualization and self-transcendence. This imperative loosely corresponds to the above mentioned manifestation of compassion of the second kind. The freedom imperative addresses an inherent inclination of intelligent beings above a certain threshold of intelligence to reaching beyond the care for their own continued existence and well-being and towards self-transcendence. Notably, processes of development, learning and adaptation under the care imperative are always directed towards homeostasis, or in some cases, the movement from one state of homeostasis to another (e.g., adaptation). The freedom imperative in contrast manifests in influences and interventions that pushes individuation towards far from equilibrium states of becoming and may cause the disintegration of certain individuated aspects of identity. Events and practices associated with the freedom imperative are often described in terms such as "ego-death", "jumping into the abyss", "meeting the unknown", and in a relatively more modern parlance "positive disintegration" [ref 4]. These alludes towards a process of individuation where the boundaries of identity are radically reshaped going through phases of instability and disintegration. Finally, notice that freedom and care can come into tension whose resolution is never prescribed.

Final Reflections

Albert Einstein is reputed to have said that the most important decision we humans have to make is whether we believe we live in a friendly or hostile universe. I would add that such decision is a conscious choice and cannot be entirely evidence based. It is a choice that both generates and is affirmed by evidence reflected in the actual state of affairs of the world. What we choose to see in the world becomes that which we respond to; how we are responding is how we become. As the prospective creators of new intelligent beings, this very question gains a radically novel edge in the form of man-created artificial sentient and possibly conscious beings. Whether they will wake up to a friendly or hostile universe is in the hands of their creators — us. Framing a principle of benevolence in the widest sense possible as I try to do here is a humble attempt to weigh in towards a friendly universe and a friendly future. It’s a choice that I hope will become self-affirming.

This is work in progress and is far from completeness. It is an attempt to face a challenge that seems impossible to surmount, and yet, trying to imagine what it will be like to exist in a friendly universe populated by a diverse kinds of friendly intelligence, brings to mind a certain direction of individuation. Obviously rendering such principle to concrete paths of action, will meet innumerable impasses, paradoxes, impossibilities and failures. Yet, in the subtlety of the three imperatives, lies a deep sense of self-regulation and harmony that goes beyond utility and maximization.

References

Weinbaum, W. D. R. (2022). Open-Ended Intelligence. Bright Hall Publishing.
The vow of the Bodhisattva
McGilchrist, I. (2021). The Matter With Things: Our Brains, Our Delusions and the Unmaking of the World. Perspectiva Press.
Dabrowski, K. (1964). Positive disintegration. Boston: Little, Brown & Co.