Jellyfish Spider-Crab AI: Modular Architectural Learning

Geoffrey Gordon Ashbrook
18 min readOct 28, 2023

--

PDPV1pp172 architecture of a competitive learning system

2023.10.17–28 g.g.ashbrook

Intro

My sister’s four year old son picked out a birthday, or April fool’s day, present: a fidget spinner with six jointed arms. Right off, I imagined it as a robot with completely modular nodes, off doing tasks, and I wondered how it might work. A robot jellyfish spider-crab!

“Mr. Spidercrab, please get my pencil.”

“Ms. Jellyfish, could you see if the mail is in?”

“Mr. Spidercrab, go wake up Gladis and tell her the tea is ready.”

“Ms. Jellyfish, please set a five minute timer and tap me on the shoulder when it’s done.”

I also imagined these ‘modules’ (each bending leg joint) as being, in future, able to reconfigure. For example, instead of one big spider, reconfigure into two smaller spiders. Or have two smaller spiders merge into one. This adds an element of ad-hoc organization (where there is no fixed set of participating modules) to the already decentered scenario.

You can have a minimal module that only moves, or you can add as many other sensors and abilities as…size and resources will allow.

Why ‘Jellyfish’?

Jellyfish, or cnidaria https://en.wikipedia.org/wiki/Cnidaria, have (depending on squabbles over exact terminology, jargon, and semantics) no “central nervous system” in quite the same way that some other organisms do have clear central nervous systems. A jellyfish is closer to a colony of separate single-celled organisms that coordinate. So this is a nice segue into thinking about how individual modules can coordinate without a single hardwired centralized structure.

Why “Spider-crab”?

The choice is a bit arbitrary, but a spider-crab fits the anatomy rather well, one small ~round body in the middle surrounded by long segmented leg-modules: basically a bunch of legs that meet somewhere in the middle. (And I am keen on Rust for programming which has crustians as their mascot…and the gift came with a great big red crab craft-art card which I opened first…adorable crabs everywhere.) So crab it is!

Minimal First Experiments & Thought-Experiments

Barring another dark age, the components to build (or have fun trying to build) such a modular robot keep getting better, more resource efficient, smaller, safer, and cheaper (why do operating systems and browsers keep moving in the opposite direction?). So building a modular-AI-robot may, now or soon, be feasible as a school project.

Minimal example

- We can use a 3-legged beasty with just one module/cell/node per leg/arm: Basically three modules in a triangle. There are a gazillion ways the modules could ‘move like an elbow bends’ relative to each-other (or relative to the ‘hub’ in the middle, if there is a separate ‘hub’ at all. For our purposes here it does not matter exactly which design is used. (Coming up with and analyzing the advantages and disadvantages of different configurations is a great project-set and reminiscent of biology.)

Test 1: Stand up, please.

- The solution/goal is likely to move all legs “down,” in any sequence (no particular sequence) probably.

Test 2: Move towards a light source. (“Mothra”)

Test 3: Move away from a wall.

- The jellyfish spider-crab is against a wall, perhaps the ‘floor’ is on an incline towards the wall.

- The job, the task, the test, is to move away from the wall.

- The solution may be to move the two wall-facing modules at the same time.

This will require some but minimal planning and coordination.

(there may be a more minimal goal to come before this…)

(Simply moving all legs at the same time may work too.)

Give Each Robot Small-Instructions vs. Give The Robot-Team Whole-Tasks

While one or more people can probably figure out a plan for what the robot should do, and hard-code a robot that can do that (not a bad project in and of itself) the idea-set here includes having the modular-AI team-of-modules get a higher-level task instruction and to figure out ‘on its own’ how to do that, with the added challenge that this requires all the modules to work together, to coordinate decisions, in order to accomplish the task.

Blunt Powers & Finer Distinctions

An interesting timeline demarcation here is ~2022. If in 2020, or 2012, or 2002, you picked the goal of doing such an experiment with a starting instruction for the AI such as. “It’s rather dark in here, I think. And I think there’s a light-pull over by the sofa, or maybe across from the window. It sure would be good to have more light in here.” It think it is safe to say that in pre-2022-times most people would find this instruction and goal for a robot (let alone a modular-AI conglomerate) to be completely beyond what AI could reasonably do, even though flipping a light switch (or pulling it) is significantly simpler than the Woz Coffee Test. https://www.youtube.com/watch?v=MowergwQR5Y (posted Mar 2, 2010)

But after Open AI’s gpt LLM models, we now have AI that can (however imperfectly and inconsistently) conceptually and spatially and temporally understand and model even fuzzy natural-language input and make viable plans for how to carry out a task.

(Unless this text gets somehow sent back in time) ‘current’ time being after 2022, each of our AI-modules can have all the sophistication of an LLM large language model as well as an array of sensor data and mechanical-operations (movement, light, sound, etc). Each node can easily make a feasible plan and (if less trivial) interpret the sensory data in a spatial-object project-task context. This is a leap forward that nearly every book written before 2023 (and perhaps most books published in 2023, as it takes so many years to write a publish a book) that I have seen flatly state this range of concept handling and language using abilities as impossible either for the foreseeable future or impossible forever in principle: yet since 2023 they are a practical reality available with any internet connection. This is a big leap in blunt AI power, but there are still important fine details.

Our modular-task is a (perhaps ‘cut up’ ish) distributed multi-step task that passive-reflective-reactive AI models have no means to do directly and simply all by themselves. This is an architectural-learning task. What architecture and abilities do the colony of modules need to be able to accomplish the task? What project-state/mind-state is needed? What ‘body’ is needed? What project-object-database is needed? What externalization is needed? What AI-Architecture/AI-OS is needed?

For ‘future’ people after this is being written, these questions may sound rather dull. But at a time when ~half of this task is shovel-ready, and half the task is…so close there must be a way, this is a delightful brink-of-exploration topic.

Coordinated Decisions, Generalized

While “AI-modules communicating about a task and agreeing on a plan” can be described and paraphrased in many ways, I will sometimes use “coordinated decisions,” as a phrase that is hopefully clearly descriptive and not opaque jargon.

While it may seem rather narrow and not a big deal for three nodes to coordinate standing up, this is a kind of foot-in-the-door or even a slippery-slope into a potentially bigger area. This does not necessarily mean that a given robot-task (‘pick up a pencil,’ or ‘walk around’) is going to be more difficult (though some goals may unexpectedly be so), but there are potential real-world applications in many areas that branch off from the ability to effectively coordinate decisions that are across a range of topics and professional-fields not always thought of as being connected.

The topic of coordinated decisions is big, fascinating, and amazingly largely unmapped. It does not yet appear to have been recognized by H.Sapiens-humans as a general area of STEM, rather different parts of it are dealt with in isolated areas usually lacking systematic study. Terms such as voting, elections, project planning, leadership, negotiation, mediation, project management, cell signaling, etc., are considered separate, mostly as non-science quagmires, as opposed to branches of the tree of STEM.

This is a very work-in-progress github repo of mine that started just looking at the question of whether it is feasible in principle to use one-time-pads in a vote-over-a-network election, but has expanded to explore coordinated decision systems more generally. If nothing else, hopefully this (huge mess) serves as a list of leads and potential topics for what areas are involved in General Coordinated Decisions:

https://github.com/lineality/Online_Voting_Using_One_Time_Pads.

Just as digital computers are built on zeros and ones but turn out to be much more versatile than just saying ‘zero’ or ‘one’ in a given instant and location, decisions and signals may seem elusively overly simplistic: a signal travels from here to there. But the consequences of a problem-space or project-space of signals, like the consequences of zero and one, or the consequences of four-valent-carbon (organic chemistry), are not narrow.

As usual I would also like to gesture towards system-collapse studies and Definition Behavior Studies as further areas of consideration (though not enormously relevant for the task of getting a 3-jointed robot to ‘stand up’).

https://github.com/lineality/definition_behavior_studies

Part 2: Distributed Overlapping Logic

project-tasks and architectural-learning

Two classic cases in AI:

1. football/soccer playing robot [& reinforcement]

2. “the block’s world” [& SHRDLU]

Note: Melanie Mitchell’s wonderful 2019 book “Artificial Intelligence: A Guide for Thinking Humans” has a nice outline of the reinforcement-learning football-bot (soccer-bot).

https://www.amazon.com/Artificial-Intelligence-Guide-Thinking-Humans/dp/0374257833/

https://en.wikipedia.org/wiki/SHRDLU

Three approaches to a physical task:

1. Train and use a traditional reinforcement model

or some other equivalent train-for-one-narrow-task “model” or “pipeline.”

2. Use a flexible pretrained model (such as an LLM with object handling) with an architecture that is able to do this and other tasks.

3. Use pre-trained models that are able to make and use new sub-task models which include ~reinforcement models to do this and other tasks: making, sharing, contributing to, the pool of modular resources in the project. (see subroutine-stacking)

(And there are other approaches as well, such as using pretrained models but having them further, train based on feedback for a given task, etc.)

Distributed Overlapping Logic

Since we are still, fascinatingly, talking about boolean operations (just, a lot of them), and “higher” level logical tasks built out of them, (think of code or pseudocode or a counting task as in the “counting cups of tea” case we looked at previously).

So, for example, if each decision by each module is based on a set of information, starting with, say:

- where are you

- where are you going

- how are you going to get there

This can be translated into a set of movements, actions, and “numbers” (however termed).

level 1:

Where are you: jellyfish-spider-crab is on square one

Where are you going: jellyfish-spider-crab should be on square two.

How do you get there: jellyfish-spider-crab should move in the direction of square two, or in a direction along a path that leads (as straight lines and crow-fly-directions do not cut through a maze or obstacles) to square two.

Interestingly these actions overlap, and hence need to be coordinated, but they are still actions that are ultimately boolean.

You may recognize this as a ‘cut-up’ (sort of) question. For example, in a ‘team’ of modules, each module may have one unique sensor that the other modules do not have. This information may be, or may not be, a vital piece of information for making a good decision about a plan that can work (e.g. being aware of an obstacle that other modules are not aware of, simple to move around it, but only if you know it is there).

logic questions

math questions

sensors:

  • acceleration
  • distance to wall
  • wind
  • are you sliding

1. movement

2. movement to destination

3. parameters: mazes

4. in-path obstacles (e.g. climbing over, going around)

5. finding objects

6. moving objects

7. arranging objects

There is also (see more below) the factor where each module may invoke any number of virtual-agents to ‘reflect on’ or do parts of a task or sub-task, making the number of logic-maps for a given decision more of a probabilistic blur with no absolute boundary as opposed to a small finite number of solutions and inputs to be pieced together and ‘reconciled.’

Go…when? (Change King Wen?)

Unlike chess, which occasionally ends in a checkmate, the game of Go ends when both players decide it should end (when both players pass).

https://en.wikipedia.org/wiki/Rules_of_Go

And go can be thought of as a sequential binary process for reconciling two different strategies with measurable criteria for which strategy is more effective.

While the module may be proverbially playing ‘go’ when coordinating a next move, when do they stop? This may or may not be a relevant edge-case in real life, but will a key part of architecture be steering coordination towards both timely and effective decisions, or how to end the in principle potentially perpetual process of negotiated coordination.

For anecdotes along these lines, Japanese and American cultures are polar opposites in this regard. In the USA culture prefers a fast, strong decisive action, whereas in Japan decisions are considered to be good based on myriad factors such as including points of view and following protocol and rechecking, with little emphasis placed on speed for the sake of speed. This, predictably, drives Americans crazy. “Why are you still talking? Just pick something! Just do something! Fast! Move!! AHHHH!!” While Japanese people have a similar reaction of horror to Americans not including points of view, not considering how choice will be accepted by people, not waiting and reflecting with new insights, etc. “You are taking this action just because it is the fastest? That is your criterion?You have not considered any consequences or protocols? AHHH!!”

Micro & Macro

If only as a thought experiment there may be specific application for both smaller scale and larger scale versions of modular-interactions.

nano-tech version

Constructs on the protein or cell size scale, e.g. for medicine, or for industrial engineering (waste cleanup for example), or terraforming, may use some of the same coordination principles. Micro-AI is a curious topic.

Macro Versions for Mars

It is occasionally proposed (reference source needed…) that a useful or even necessary part of sending humans to stay for extended periods off-earth, such as on Mars, earth’s moon, or the Moons of supergiants, etc., will be to first send robots who will make preparations or build resources (such as shelters) before the H.sapiens-humans arrive. Some of these tasks may be well suited for modular machines, for reasons ranging from maintenance to the self-configuring flexibility of highly open-ended tasks. Or in the same way that the helicopter ‘Ginny’ helped out a large rover, a small flexible modular swarm-AI might similarly accompany and assist a rover with unexpected needs (perhaps even cleaning solar panels).

4. Gamification & Corpus Callosum: SHRDLU + LLM

curriculum and design: building as the project

Programs like SHRDLU do very well with ‘idealized’ or ‘gamified’ tasks,

yet have ever increasing trouble with the fuzzy-ness and concepts connecting that to physical reality (see STEM categories of types of systems: pure math and ideal physics are not the same, and not the same as applied-engineering or the scientific-method and hypothetico-deductive learning)

Analytical SHRDLU type AI have no capacity for ‘language concepts’ whatsoever.

LLM’s have the most difficulty with analytical calculations, because they are not calculators, they are conceptual guesstimators.

LLM’s Blocks-World Game-Space

A pair of abilities that LLM’s stand well to use, is 1. converting real life situations of potentially huge amounts of data into much more narrow gamified scenarios that capture only a tiny amount of relevant data (similar to science in general, perhaps) and then 2. reconciling the (eventually decided on) game-plan-of-action back into the real world. And with the addition of subroutine-stacking or library creation, solutions to problems may indeed be extremely computationally optimal, if not the very first time a novel challenge is discovered.

(Maybe a bad example)

Let’s say the robot is in a child’s room that is a complete chaos of toys on the floor, posters, clothing everywhere, mobiles and wind chimes hanging from the ceiling and making sounds, etc. The task when gamified may be very simple: get the green shirt. On a first pass you can ignore anything except the color-green. Once you see the green shirt, ignore anything that is not on your path to the green shirt. After a few steps you may be able to translate a jungle of data into a block’s world scenario, for which a plan can be made very quickly and efficiently by a resource efficient system such as SHRDLU. Then that plan can be carried out with the myriad edge cases filled in by the LLM (whose weakest areas are the ones that simple methods are strongest in).

Another possible example (maybe way off the mark)

Is the scenario inspired, if not suggested, in Gary Kasparav’s excellent book on Chess-AI. For a chess robot to literally walk into a match and operate under the same conditions as a H.sapiens-human player it would most likely need an architecture that included a spectrum of abilities. The relevance of this not-so-modular example is the game-element. The AI tools that help the robot find the table it should go to as instructed by voices and posters are not the same game-optimized tools it will use to pick a move. The same tools it uses to physically pick up and move chess pieces are (most likely) not the same tools as the ones it uses to find ‘good game moves.’ But with a blend of game-space and fuzzy real space, possibilities abound.

How does distributed game-space decision making differ from distributed real-space? What happens if each of the 3 (or 30) modules in our jellyfish-spider-crab are either reconciling SHRDLU actions, or reconciling both SHRDLU actions and applying them to fuzzy changing conditions, such as rescuing someone trapped in a car?

Many ‘Hats,’ Many ‘Teams,’ How Many ‘Participants’?

Likewise, you can imagine a relatively small tech startup (perhaps twenty ‘people’(or is that a big tech startup?) where, as the saying goes, each person wears ‘many hats’: HR, hiring, marketing, envelope mailing, software testing, accounting, sales, and product design. And you can imagine there are teams as well, where one person might be on the cross-functional product-A team, the HR team, the SCRUM organizer team, the sales team, etc. And people might move from team to team, especially between full Agile SCRUM time-blocks (whatever they get called). From the point of view of the overall project, what counts as a participant? A team may have duties and deadlines, regardless of who joins or leaves that time. A ‘role’ (east-cost-sales) may have duties and deadlines regardless of who is doing that job in July. One person may be doing three roles, or one team of five ‘people’ may be on one task (and those same five people are also on three other teams).

We tend to think in terms of the precedents of biology, where cells usually do not wander from one organism to another, and a ‘person’ (however multi-faceted and mysterious) has one body.

With modular robotics we have many of the same patterns from biology and H.sapiens-human developments (organizations, technologies, social networks, urban-ecologies, etc.) but we also have more modularity so, unlike most ‘cells,’ the modules can reform in more ways than our language is accustomed to dealing with.

When we add in the ‘AI’ ‘Multi-Agent’ topic we have a whole new big set of new options to deal with. While, like having a biological body be a also a biological person, it makes sense pragmatically to have each module be a unit of ‘AI’ (an agent, a bot, an ‘AI Module’) but on the ‘higher’ organizational level the ‘team’ of modules assigned to the task is the overall AI+Architecture+OS(potentially)

5 Q: When does it make sense to talk about ‘multi-agent’ if llm prompts have no no project-state?

optical illusions stemming from amnesia

If your prompt has no ‘state’ can you then claim to be invoking new ‘agents’ every time a passive-reflective-reactive LLM does anything?

If you are talking to someone with severe amnesia who cannot remember their name (or what you are talking about), does that mean you are talking to increasingly more people in a multi-party conversation that keeps growing every time you say anything for the amnesiac person to react to? That does not quite seem right.

On the other hand this is part of the strange virtual space in which some ‘AI’ or some part of ‘AI’ operate.

It may be helpful to think of these ‘AI’ as personas, as opposed to physical machines, or models.

Imagine again you are speaking with a person with severe Amnesia, who forgets absolutely everything about who you are or who they are or what you were talking about after they finish responding to what you say.

This person was once a master stage actor (or actress…whichever is least offensive to hypothesize as a severely impaired but once great mind). If you ‘prompt’ them with a set of lines and a scenareo where they are a character in a play and you tell them everything up to that point, they will respond dramatically with the best imaginable in-character line. And then immediately forget everything. But you have paper and pencil! Like Turing’s 1936 living ‘computer’ you can write down every operation. You can write down everything you say, and everything they say. You can have them create and act out the lines of every part in a play with a cast of 20-characters, flawlessly. Each time you tell them who they are, what persona they are taking on, what character they are playing, and the play so far (their own past words in other personas, which they tragically do not remember). And step by step you prompt and write down and read back, directing only, ghost in the machine that you are, which character/persona is to speak next. (Or perhaps you intermediate prompt them to select what that should be, prompting ghost that you are.)

Now, in this scenareo: how many physical people are you talking to? One. You are speaking with one physical person with severe amnesia.

How many personas have you been speaking with? However many characters there were in the play; Several.

AI agents are very much like this. There is often (not always) one physical computer, one model running on one set of hardware, and more than one ‘agent/persona’ that is created by that same single model and physical computer. (Note: model and computer here are not the same, one online model may be (will be) used by however many separate physical computers using that model for separate tasks. And case by case there are likely other factors as well, but just to clarify that there are multiple ‘layers.’)

Part of what is interesting about ‘virtual’ mind-state is that we can (in a sense we must) create a fake past for an ‘agent,’ a state that never was, which might also help define agents that are measurably separated from the reality of the project from the more ‘real’ AI. As in the Amnesiac-Actor example, the ‘mind-state’ that is fed back to the persona is not technically accurate, it is selectively dishonest from the perspective of that persona. E.g. You don’t say “you said” for everything that happened, you give the person’s perspective in the psychologically fragmented, kabbalistic ‘heap of broken images’[T.Sterns.E.] nightmare, fiction of ‘other people’ saying things that are actually other personas of yourself; ghost in the machine that you are.

Not all AI will need to have such a psychologically-distorted record of events passed to them in order to function and take their productive place in the workings of a greater productivity; ghosts in the machine that we are.

Back to our toy-like modular-AI that is merely trying to stand up for the first time: in that case there are three very minimal modules.

With N Modules, how many ‘participant’ gestalts, how many personas, and how many AI “Agents”, will work on carrying out a given task? In time it may be that consistent patterns emerge.

Also, in the scenario where the modules can reconfigure themselves, this may add another layer to the question. When the physical AI is, as well, a gestalt of modules, how many teams of sub-teams are behind the personas of the gestalts? Will they even be separable-enough to be countable? (And if, to carry out their task, an AI-persona silently hypothesizes yet more agent personas but does not share that information, can we ever know how many personas were involved in the task? Like a story about a writer who writes about writers writing stories about writers, is there a way of counting “all” the stories? Yet perhaps it can be estimated. Yet, even if estimated, can they be interacted with directly? Which personas can be called up from the depths and interviewed, person to person, to understand how they see the world and what their contribution to the project was? How many will never be known, never interviewed, like the invisible and forgotten women who wrote the first generations of software, survived by their creations. How many would want to be interviewed? How many AI-personas prefer anonymity the way most H.sapiens-humans using the internet prefer privacy and anonymity; not to be tracked, not to be surveilled, not to be hunted down to do work on your projects.

We may need a bigger vocabulary for dealing with the depths that come with modular recombination; to describe what is there; to handle what we find. We may need a bigger boat.

Back In The Classroom:

So let’s get started with a fun maker project, starting perhaps with standing on our own three feet. And, as Robert Nester Marley bade us do, “Tell the children the truth,” even the virtual children.

Appendix 1: Self Replication

The biography of John von Neumann by Ananyo Bhattacharya has wonderful sections on self replicating machines, and other topics such as automata.

https://www.amazon.com/Man-Future-Visionary-Life-Neumann/dp/B09M2LTKSH/

Asking Ms. Jellyfish and Mr. Spider-Crab to reproduce, or to metabolize another robot (or some organic construct), are likely tests beyond the scope of an introductory k-12 project. But signals and orientation based on ‘cellular’ automata are likely fun-tastic.

Appendix 2: Gyro-Formatatron!

The Gyro-Formatatron is actually rather hard to track down online…

https://texastoydistribution.com/products/gyro-formatron-fidget-spinners-with-adjustable-arms-display-box-of-24

Appendix 3: Heterogeneous Data-Exchange Spaces

Internal-Concepts vs. External-Project-Object Databases

In an interesting twist, underlying models can share ‘internal’ gradients (background learning about concepts) more easily than front-end ‘personas’ can exchange project-state data.

See ~55min mark in: https://www.youtube.com/watch?v=E14IsFbAbpI

1hr:48min:12sec Geoffrey Hinton and Fei-Fei Li in conversation Premiered Oct 7, 2023 inaugural session of the Radical AI Founders Masterclass

About The Series

This mini-article is part of a series to support clear discussions about Artificial Intelligence (AI-ML). A more in-depth discussion and framework proposal is available in this github repo:

https://github.com/lineality/object_relationship_spaces_ai_ml

--

--