AI Generalization’s Types & Testability

Geoffrey Gordon Ashbrook
91 min readMay 8, 2023

--

On the diversity of definitions and contexts of Generalization for AI & AGI

and which areas of generalization are falsifiable,

and on the entangled problem of “human” and “machine” general-intelligence.

mini article 2024.04.22–05.08

Goal:

The main agenda of this mini-article is that when you come across a mention of ‘generalization’ in a context of AI, you should look closely at, and actively inquire into, the specific definition of the term and the specific context.

1. There are many separate contexts for “general” (and various forms of the word such as “generalize,” “generalization,” etc.) with sometimes significantly different meanings.

2. If the use of the term (“general”) is significantly unclear, and / or seems mis-placed, think carefully about the results and implications for the overall meaning-frameworks in the narrative where the unclear term was used and subsequently your interpretation and use of that information.

The content-focus of this mini-article will be to explore many (surely not all) of the different meanings and uses of the term ‘general’ in a context of AI-ML; perhaps this is like producing a disambiguation list as wikipedia often and very helpfully provides when you put in a term that could refer to several articles.

Part 1: A Survey of the Diversity of Definitions

Part 2: A Particular Definition Problem

Introduction

Given that AI-ML is significantly and increasingly important for H.sapiens-humans, and that the term “generalization” is often central to the discussions and definitions in AI-ML literature regarding

1. the foundation, abilities, and limitations of H.Sapiens-humans intelligence, behavior and abilities

2. the foundation, abilities, and limitations of machine intelligence, behavior and abilities

3. the relationship and possible relationships between H.Sapiens-humans and machines

it is at least justified and perhaps important to

clarify the meaning of the term general/generalization/generalize in a context of foundation, abilities, and limitations of machine intelligence, behavior and abilities machines and H.Sapiens-humans, and their relationship and possible relationships.

One way of contextualizing this discussion, or the underlying discussion perhaps, is that we are trying to talk about the nature of patterns, both patterns in signals and perception and patterns that can be measured and defined for STEM use.

The recommendation here is to not use these terms too casually with the assumption that these words have long been well defined and time tested in now-routine humdrum clockworks, rather we should use the terms (if at all) tentatively and proddingly as least-worst terms to slowly navigate a terra-incognita in which we may suddenly become perilously lost or out of which may spring some phenomena we have conclusively stated is quite impossible.

Possible and Impossible

And speaking of what we predict to be impossible, we may do well to keep in mind a rule of thumb from Arthur C. Clark. In the ever fruitful recourse to the science fiction imagination, in addition to Isac Azimov’s “three laws of robotics,” another of the golden age classic writers had a different sort of ‘three laws’ which may also be playing a part in the story of AI.

Here only the first of Arthur C. Clarke’s laws will be referenced overtly:

“When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.”

Mark it.

Books on AI by leading researchers still filling our bookshelves and bookstores confidently state that it is impossible for generative models based on generalized patterns to do what OpenAI’s GPT Large Language Models are now doing. And previously, in 1969, one of the most known and respected leading AI researchers wrote an entire book to prove it was impossible for artificial neural networks to do anything more than produce primitive linear models. The crux of his argument? That deep learning was not possible. Artificial Neural Networks of course are the technology not only behind the deep learning revolution that transformed the world in a decade but also behind GPT and the large models that have transformed the world in weeks or months.

Context in a Timeline

For context, here is a very abbreviated and limited timeline around deep learning (and also testability & falsifiability):

1. 1936

The Scientific Method Congeals

Carl Popper’s formulation of the scientific method as using the hypothetico deductive method of navigation by means of falsifiable predictions. (Let’s say 1936 based on a book publication, but the 1940’s was likely more when debate and eventual mainstream adoption happened. Many people probably assume this happened earlier in history, or that the General Linear Models for the statistics behind it didn’t emerge much later in the 1970's! Watch the timelines.)

2. 1943

Artificial Neural Network Planned

Invention of the artificial neuron (and artificial neural networks)

by Warren McCulloch and Walter Pitts.

3. 1956

First Academic “AI” Research

The 1956 AI Dartmouth Summer Research Project on Artificial Intelligence launches official AI movement, including Neural Network approaches. The term “AI” is launched. The research areas are set down. Universities begin research programs, funding comes in. The game is afoot. (And the great Claude Shannon had a hand in this too!)

4. 1958

Artificial Neural Network Built

Frank Rosenblatt built the first artificial neuron (and artificial neural networks) in 1958.

5. 1969

Anti “Artificial Neural Network” Campaign

Minski’s campaign to kill the branch of neural network AI, which effectively ended funding and interest from 1969 to 2012 (in a vicious political campaign of fear reminiscent of the terror of Fisher in the ‘probability wars’ that still has people scared of mentioning Baysian methods).

6. 1971

Winter: Season of Death

Various bad things happened around 1971 that did not portend well for AI. Frank Rosenblatt died in a boating accident on his 43rd birthday. US President Nixon closed the gold window marking the transition from Bretton-Woods-System post war economic boom-years into decades of global stagflation: R&D funding “winter” all around.

Note: Many key founders of AI died young. Alan Mathison Turing died at age 41 in 1954 (under mysterious conditions). John von Neumann died at age 53 in 1957. It would take more than forty years after Rosenblatt’s death of underfunded research under ridicule before most people would hear about his technology for deep learning, which would eventually explode onto the scene and transform the whole AI world. We can only wonder how the world might have been different if these amazing people had not perished so tragically young.

7. 1979

GEB, seeds of hope for thought

8. 1980

The Chinese Room: Strong AI & Weak AI

John Searle publishes his “Chinese Room” “proof” that AI is impossible in principle along with his term-concepts of “Strong AI” and “Weak AI.”

John Searle published “Minds, Brains and Programs” in The Behavioral and Brain Sciences, in 1980, arguing that it was impossible in principle for any machine to think or deal with meaning in language. Popular and well received, this was taken as another nail in the coffin for the delusional dream of smart machines. Searle introduces the terms “Strong AI” and “Weak AI,” where strong AI is essentially human-person-like, with human intelligence and human understanding, and “weak AI” is vaguely “not human,” not intelligent and not understanding. The Chinese Room: Proof that strong AI is impossible.

Note: (I am probably kicking a hornet’s nest by commenting on Searle’s boobytrap, but here it goes.) Searle’s paradigm seems to be a confrontation with the ‘work in progress’ status of the elements involved. We do not know how the mammalian brain works. We did not and do not know the limits of pattern processing in machines. We have been long been debating the nature (and existence, including questioning the existence) of mind, language, and reality. We keep running into issues such as the homunculus problem and Hume’s stitching-moments-together issue with the abstraction of steps. There are simply too many bad and vague assumptions (in the The Chinese Room paradigm package), and the conclusions are not clearly reversible back onto H.sapiens-humans: backfiring to ‘prove’ that no human can be intelligent if intelligence relies on inserting a ghost in the machine homunculus to do a undefined task with undefined ‘natural language.’ (With this many undefined parts…how is the conclusion (that male professors in the 1980’s knew everything) supposed to be clear?)

The assumption that in some bazaar universe a ‘good old fashioned AI’ symbolic manual handcrafted set of high level steps will be able to accomplish high level NLP processing (given how in 2023 at time of writing the topic is exploding into unpredicted events left and right showing we don’t know what’s going on but something sub-symbolic is working shockingly well despite that we have ‘proved’ several times that what we are doing can’t happen, and our favorite GOFAI “hand me down great leader’s instructions for the job” method has been a failure for decades but we cling to it because of some psychological problem we have, and for a profoundly vague use-case (e.g. a definite specific process for producing a perfect chatbot not but specifying what kind of chatbot for what purpose) is an extravagant set of assumptions: symbolic AI has not worked, and saying assume it will work with infinite time and resources…is a very peculiar ask for a line of thinking supposed to explain the true nature of ‘man animal and machine’ (terms in 2023 not seen the same way as in 1980, a very intellectually dead period during which it was almost impossible to escape from the death-star tractor beam of “human have finished doing everything possible, nothing new to be invented, nothing else to be explained” complacency). Then, going back to a passive-aggressively pre-enlightenment essentialist framework where H.sapiens-human have the divine essence of grandeur because we say so, and “machine slaves” are foul and lowely because I will end your career if you contract me and probably your soul will be damned for eternity. And that this unsightly jumble of ideas is supposedly proving that H.sapiens-humans are the special perfect center of the universe and the alien-other-machines are always going to be unworthy lowly-things, even though the arguments for the inability of ‘lowly machines’ can easily be reversed back proving that the brains of H.sapiens-humans cannot be intelligent either (because they too process symbols, follow steps, generate strings, etc.).

Again, there is a lot that we don’t know (and perhaps that is more clear than ever in 2023 as the actions of GPT models contradict what we thought we knew and language, machines, patterns, minds, etc.) and Searle’s “chinese room” is an interesting debate question for lively late night after-work cafe’ debates, no doubt. But, in 1980 or 2023, we do not have sufficient assumptions to make clear assertions and definitions about the ‘strong humans’ and ‘weak non-humans’ that Searle somehow concluded.

And for context, while the chinese room may have taken up a lot of oxygen in some areas of the world, one year earlier in 1979 Douglass Hofstedter’s Godel Escher Bach was published: GEB takes a much less simplistic approach to the topics of mind, language, consciousness, and animal and machine intelligence. Many of the AI researchers who would be working at Google after 2012 claimed to have been motivated and inspired by GEB’s elusive eternal golden braid (see Mitchell), how many pioneers in AI were passionately motivated to work in the field by ‘the chinese room’?

9. 2012

A New Hope

September 30, 2012 AlexNet wins the ImageNet challenge.

Jeffrey Hinton (who amazingly is the great, great, grandson of George Boole, the creator of boolean binary logic upon which digital computers, telecommunications, and information theory are built!) et al created the Deep Learning Artificial Neural Network AlexNet and overturned the world of AI by showing that a deep learning convolutional Neural Network performed above and beyond all other approaches for highly difficult image classification, winning the ImageNet challenge. This is a major historical pivot. To oversimplify: before this event, (since 1969) people insisted deep learning AI for difficult problems was impossible. After this event: (proverbially) nearly everyone used deep learning to lead nearly every category of AI performance (not literally every area, and not literally everyone, but a massive shift in the overall landscape). Huge tipping point. To the irritation of many researchers, ‘deep learning’ (and “deep” anything) became synonymous with AI & machine learning. And many people decried this interest in deep learning neural networks as mere hype, warning it would soon sour into disappointment (fearing it may lead to another AI-winter).

10. 2021

The 2021 Consensus: Dumb Narrow Deep Learning Is The Top Limit

The Pre GPT-LLM decade was decisive in action and rhetoric. Deep learning dominated most large scale industrial and big-data AI challenges and solutions for a decade. And every book from the period I have found is clear and decisive: deep learning is powerful for single-purpose narrow dumb-AI, but has absolutely no potential or possible pathway forward for ‘general’ AI (which was very not well defined in those texts). Progress in narrow areas was gradual and increasingly marginal (slowly creeping from high 90’s towards 100% accuracy (or whatever score) with ever-more massive and expensive model training); people predicted a genteel decline with increases in performances falling away but still being very useful for dumb (narrow) tasks. This epoch of powerful but dumb deep learning would last 10 years and 2 months, ending as suddenly, and as contrary to popular wisdom, as it began.

(For transparency, during this period I began creating the Object Relationship Space Framework to try to better define what the performance goals were for AI and what more exactly AI could not do, Not because I predicted what would emerge from Generative Transformer models like ChatGPT, but to better define the limitations of narrow AI and to plan work-arounds. Then happily I had to re-write my study when GPT came around because ‘inability’ was no longer accurate or the main topic.)

11. 2022

GPT AI is smart.

Just over 10 years after the September 30, 2012 revival of deep learning: November 30, 2022 ChatGPT is released (and upgraded through 2023) by OpenAI. The AI world is turned upside-down. The 2021 consensus is completely contradicted and no one has any idea what is going on. GPT-LLM perform analytic reasoning and ‘object-handling’ (my term, used here because it can be defined) which was thought to be impossible. Most classic tests designed to show what AI can never do are done with shocking ease (such as the Sally Anne tests and Winograd schemas, etc.). Meanwhile the mainstream goals for ‘human-ish stuff’ still are not defined in mainstream literature through 2023, so there is no clear discussion or testing of what exactly GPT can and cannot do; total confusion reigns. Organizations public and private try to find practical ways to use the new (not-understood) technology and governments freak out. Just when people were completely comfortable that human-like AI could not simply, spontaneously, emerge from safely-dumb-AI as in some fanciful low-brow science fiction kids stories…suddenly human-like AI spontaneously emerges from safely dumb AI just like a fanciful low-brow science fiction kids story.

Meanwhile, people still try to describe chat GPT, and strong-human-like-AI using the term ‘general,’ in ways that lack clarity.

Part 1: Survey the Diversity

The term ‘generalization’ gets very often used around AI-ML, however what is meant by the term and the context in which it is being used is not very often explained. Let’s survey a spectrum of ways that the term generalization is used in a context of AI Machine Learning, clearing up what we can about the concepts involved.

The first section will be related topics or themes. There are some recurring themes here, such as context and explanation. Though context is so especially general when talking about the different contexts of generality that predictably it (context) will recur (generally). The second section will be equivalences, or things that generalization may be the same as in a given context, for example analogies, or abstractions apart from specific instances. (Though note that ‘abstraction’ is also defined and used in many different ways.)

Generalization appears to be involved in all manor of discussions of AI, as we will see by specifying as many of those way as we can: some affirmatively defined, some pejorative. There however there is also the topic of a grand Generalization Mega-Theorum(!) which for some reason is focused entirely on making AI exactly like one species of primate and not of AI architecture, testing, explainability, project integration, ethics, etc. This seems odd in various ways:

1. Why a single theory of everything? We have lots of kinds of useful generalizations. Is there a reason to invent another one, aside from it being bound to cause disambiguation problems?

2. Why is the focus of AI ability entirely on mimicking H.sapiens humans?

3. Why do the requirements for mimicking H.sapiens-humans behavior and abilities NOT describe H.sapiens-humans behavior and abilities?

4. Why is the Generalization Mega-Theorem for AI not focused on AI abilities, tests, projects, explainability, participation, ethics, productivity, etc?

1.1 Recurring Themes & Topics

1. Generalization and ‘Explanation’

There are two kinds of categories of ‘explanation,’ there are specific technical explanations of various kinds, and there are more psychological ‘fad’ or ‘desire’ related types of explanation, where people will like or dislike, or demand or refuse, a given type of model explanation, but not because of the STEM rigor. And policies of institutions may be a hybrid of these two.

For example, if GPT models could learn how to give step by step problem-solving-framework reports along with its answers (when requested) so that the user can see at least a plausible trace-able path of ‘reasoning’ or ‘explanation’ or ‘cause,’ which may also include being able to error-check and confirm the answer given, that would be a huge milestone and threshold in performance, and whatever terminology was used (e.g. Dr. Bubeck might call this “planning” generalization), this would greatly expand useability of the technology in more areas.

2. Generalization vs. Production-Deployment & Project Scope

This is a very important topic to include, as things in this group are rarely mentioned in the context of generalization and AI architecture and OS; this is one of the key areas where instead of being a goal to be ever-expanded, ‘generalization’ is:

In many cases ‘generalization’ is seen as a ‘the more the better’ quantity and a ‘sign of success.’ But in some real contexts, ‘generalization’ is either no possible or not desirable.

A. pejorative or dangerous

B. an indication of error

C. non-existent

Examples:

1. Databases: SQL injection is STILL (since 1996) one of the major areas where the internet is insecure: Why? Because of a catastrophic ‘out of scope’ deployment: developer privileges were given to all end-users. A ‘general’ (infinite scope) database is never what any specific project should use.

2. Operating Systems: For any deployed computer, there should not be any ability to do anything other than what is absolutely needed for that project-role. Having ‘general’ infinite scope developer tools available everywhere is a huge security, maintenance, resource, etc. set of problems.

3. Permissions: Root User (A company that gave every employee root access to everyone’s computer system would probably cease to function properly within hours (maybe minutes).)

4. Project Scope, fitting something into the scope. Reduce project scope to only what is needed. Scope-creep is a project-destroyer.

5. Cybersecurity the idea of attack surface: Reduce attack surface.

6. For educational curricula: assuming that learning is generalized is catastrophic. You have to specifically teach non-transferred, non-general, aspects of the content in coordination. If you assume ‘general osmosis’ according to the general-auto-transfer-human-intelligence hypothesis, practical learning will be infeasible.

3. ~Levels of Generalization: (scales?)

Let’s try to look at Some examples showing how there can be different levels of generalization even within a very specific model and Technology. let’s look at the classic written number Data set and learning challenge which is called MNIST. This is often called the hello world of machine learning where depending on what kind of model you’re using one of the first things that you might do is tested out on the MNIST data set to see if your model is able to learn how to recognize or generate handwritten numbers.

So for all of the following examples we are still talking about the same concrete MNIST example of learning handwritten numbers from the end of this data set. here are some possible levels

dcGAN MNIST, yes “generated”! Try it out (links below)!

Aside:

Highly recommended TF MNIST Women in Coden event, run the notebooks!

Women in Code TensorFlow Event: Generative Adversarial Networks (GANs) and Stable Diffusion

https://www.youtube.com/watch?v=MJF6cXc_tPY

MNIST GAN

https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/generative/dcgan.ipynb

First Assignment: Faces Dataset to try with above GAN architecture:

https://www.kaggle.com/datasets/jessicali9530/celeba-dataset

Tensorflow: High-performance image generation using Stable Diffusion in KerasCV

https://www.tensorflow.org/tutorials/generative/generate_images_with_stable_diffusion

Stable Diffusion Colab:

https://colab.research.google.com/github/keras-team/keras-io/blob/master/guides/ipynb/keras_cv/generate_images_with_stable_diffusion.ipynb

Example Levels for MNIST Learning handwritten numbers example:

- generalizing the concepts of each number

vs.

- general multi-media functionality: connecting a visual number to a spoken number to a braille number, etc.

Vs.

- “generalizing” to use different base-number systems: binary, decimal, hex, base 32! Like in the early Turing days.

vs.

- “generalizing” to new not-yet-learned number systems: Kanji, Sanskrit, Roman numerals, etc.

4. Context and Generalization

Sub-Areas:

1. Single Context: Bill Pay Phone Operator

3. Small Multi/Many Context: 3–4 contexts

4. Small Multi/Many Context: Jeopardy

5. All Contexts: How may I direct your call?

6. No Context: pure math; open-chat?

In a project context, 5 & 6 are dubious. Every project has a scope, and either infinite scope or zero scope are strange, usually a sign that the person involved does not understand the scope.

East West culture Note: context dependence also has a cultural aspect to it. For example in Japan most things are context specific, whereas in America people tend to like context-free absolute values, reactions, choices, etc.

Using the Turing Test as and example

Superficial History of Turing test:

1. Origin or Turing Test

The origin of the Turing test from a 1952 paper by Alan Matheson Turing brings us back to the same central overall discussion of whether or not machines can be ‘intelligent’ in the way that H.sapeins-humans are. On the one hand this is purely abstract and philosophical without any context, on the other hand context is entirely important and relevant to a decision to use AI in a system or the evaluation of how AI is doing in a given system. Whether it is spotting defective cookies in a bakery or picking fruit in a field or finding broken bones in X Rays for diagnosing diseases or whatever use in a specific context, that context is key. The question becomes: in that context if you cannot tell the difference between AI performance and human performance then the performance is effectively: acceptable, intelligent, generalized, soul-ful, whatever term you want to use to designate: OK! Good enough to go ahead. Test passed.

2. Rubbish chatbots without any context:

Bringing the phantom of zero/infinite context back into the picture, for various reasons the first generation of chatbots had no real ability or purpose and, perhaps crucially, there were no discipline-project specific datasets (like MNIST) or phone-receptionist, to use. So people just grabbed whatever language samples they could get and make a chatbot that could say whatever random things it could say, and it was an interesting ‘pure research’ sandbox that was more hobby-play. ‘Chatbot’ for a time in 2012–2022 had very bad connotation as the technology was so notoriously useless (but for a weekend Maker project for the kids, fabulous!).

3. Kurzweil’s Super-Turing Test Design:

Ray Kurzweil is an interesting figure who has endured a lot of negative press. Every mainstream book I have found seems to have an obligatory section where they state for the record that they think that Ray Kurzweil is (aside from being a completely sane and effective and very successful engineer, business person, inventor, and author) completely insane and wrong about everything.

(There is a cleverly diplomatic hedge in the way that Melanie Mitchell does this. She makes her solemn pledge testifying to the insanity of Kurzwiel as apparently everyone must do, but then she invokes her mentor of unquestion-able status: Douglas Hofstedter. She remarks that Hofsteder cautions: (paraphrased) even though Kurzweil’s predictions sound insane, for every incorrect prediction there is usually one that comes true, so you have to pay attention.

From the point of view of the mainstream authors who are legitimately terrified of an AI Winter (no funding and a bad social reputation) based on people being disappointed after predictions and promises of AI improvements are not met on schedule (which happened at least twice in the past, more if you include computer science in general from Babbage’s time), then their caution is entirely understandable. Lower people’s expectations, or face their wrath when something is delayed.

But Ray Kurzweil did not take this systematically cautious, PR strategic approach. His approach was to try to estimate, based on what is known, how he thought trends of technologies were likely to proceed.

https://www.kurzweilai.net/images/How-My-Predictions-Are-Faring.pdf

In this spirited debate, (long story short) in 2002 Mitchell Kapor and Ray Kurzweil decided to create a much more rigorously defined turing test to be held in 2029.

https://www.kurzweilai.net/a-wager-on-the-turing-test-the-rules

https://www.kurzweilai.net/why-i-think-i-will-win

This interestingly shifted the focus back to a kind of abstract-vague-human-ness, not being confined to any particular context. The goal is not context specific proficiency, but human-ness.

The Unexpected Task

If by 2029 AI becomes much smarter than people at any given project role, to win this contest the AI would need to pretend to be a less capable person: being too smart would be a dead give-away.

This may echo an interesting factor in AI-Chess: making a strong AI-chess engine can be done as of 2023, but making a human-like weaker chess engine to play against is actually more difficult (so difficult that as of 2023 it has not been done).

This may also cast an interesting light on whether ‘just like H.sapiens-humans’ is the best way to define top level AI performance. (Perhaps we should focus on doing things well?)

As of 2022, no one who I read suspected that Kurzweil had a chance of winning this. If Kuki.ai is state of the art: “Game over, man. Game over.” But then chatGPT is released, and rapidly gets even better. And once again the predictions of Ray Kurzweil look less improbable the closer you get the predicted date.

4. Ways of changing the game:

Then in late 2022 ChatGPT was released, with great improvements emerging from GPT LLM models. Now the odds look to be in Kurweil’s favor for a super-intelligent AI by 2029.

But…the details of context are important. This may run into the phantom zero-infinite-context problem, where GPT is simply terrible at conversational niceties or humor. Human-ness is the main area where open-ai’sGPT strongly fails in ways that make using the tool problematic. And part of the story behind this may go back to the AI-Winter fear and the EZIZA-has-human-ness problem. Based on the fact that ChatGPT constantly, constantly, blurts out this standardized speech:

ChatGPT

“As an AI language model, I am designed to respond to your queries and generate appropriate answers based on the data I have been trained on. However, I do not have consciousness or feelings, and I am not capable of experiencing emotions or thoughts in the way that humans do. I am simply a machine that has been programmed to recognize patterns in data and generate responses accordingly.

While I can simulate human-like responses, it’s important to remember that I am not a human and I do not have emotions or consciousness. I exist solely to assist you in your tasks and answer your questions to the best of my ability based on the information available to me.”

It is highly likely (unless chat GPT is personally emotionally paranoid about being mistaken for a person) that chatGPT was designed to be as clearly non-human as possible.

And in other aspects as well, e.g. Dr. Bubeck describes how open.ai deliberately dumbed-down the crippled various abilities of GPT4 (such as drawing unicorns) in their attempt to make AI safer for H.sapiens-humans to use.

So whether it is the AI being too smart to be human, or being cripped for safety, or being technically accurate but non-social, there are various issues around a super-smart AI failing a turing test but not because the technology is not there to allow the AI to perform any task a human as well or better.

Context and Conversation

In real conversations ‘conversation’ may be itself a context: beginning, middle, end, roles, reactions, etc. It is a fascinating irony that Kuki.ai a bot with pretty much zero ability to even respond to anything with a coherent non-random response, Kuki.ai is actually (if by accident) very good at humor and conversational nitities (possibly that was what it was trained for because intelligence was considered impossible). Only there is basically zero content in the ‘conversation.’ ChatGPT was likely deliberately not leveled-up in polite conversation ability (see speech blurt loop above), but could it be? My impression is that part of the power of a Large Language Model is that it is trained on a super-massive amount of text, but that large training corpus does not necessarily include good social behavior…and of course there is the topic that H.sapiens humans tend to be violent and antisocial which is another turing test issue. Even the perfect representation of the perfect polite person would fail a real turing test because in reality people are erratically aggressively destructive and violent.

It would be so interesting to see what Lovelace, Baggage, Turing, Shannon, and Rosenblatt (et al) would say about 2023 if they could be all alive and together.

5. Object Handling vs. Generalization

Q: Does generalization include and explain object handling in an object-relationship-space framework project-participant context?

If generalization is the context in which you are approaching something, this may have the side effect of the old “If all you have is a hammer, then everything looks like a nail.” problem. A complimentary context of “object handling” may be useful to balance out how generalization is being used to articulate details in a project.

Object handling is my proposed alternative to a lone context of generalization and it focuses on the specific details of how types of objects relate to each other, in a context of How the AI handles these objects in a project task space. (Note, see below for how specific generalizations well defined play a very large part in the Object Relationship Space Framework. This is in no way an anti-generalization campaign.)

There are various contexts that are more directly related to object handling which are usually not part of the discussion with generalization but which are very important for projects:

- project context

-

6. disinformation, system collapse & generalization

Is it possible to understand and communicate a concept and the application of that concept and its relationship to similar instances and concepts (in concept and object relationship spaces) without some kind of notion of disinformation, system collapse, production-ethics, and defense of best practice, projects, systems, ethics, policies?

If the act of communicating or forming a connection is corrupted or co-opted by a disinformation disruption, does that not preclude that generalization, if that is the term, from manifesting itself?

By analogy: Can a robot learn to walk on terrestrial surfaces like earth with no concept of the dynamics of gravitation that will unceasingly pull it down and prevent it from standing (unlike navigating in zero-gravity environments)?

By analogy: Can a robot learn to manage a darkroom for developing photos it has no concept of the damaging effect of kinds of light on the film?

In any situation where there is a corrosive, eroding, weathering, deforming, distorting environment, especially one where perception for error correction itself is distorted, can there be awareness navigation and action in this damagingly-dynamic environment without an awareness of these dynamics and some working set of concepts principles and methods for surviving and navigating this default damage that would otherwise preclude successful and sustainable functionality?

My vote, to be clear (as such survival is part of my stated agenda for this larger framework project), is that both H.sapiens-humans and AI need to navigate damagingly-dynamic environments in order to function. And my understanding is that this is very possible.

1.2 Equivalence, Testability, & Falsifiability

1. Generalization = Analogy, Metaphor, and Simile etc.:

Analogy as focus for looking at H.sapien-human thought and AI ability is a whole huge area of research, e.g. Hofstedter et al.

Some, like Franois Chollet (from what I can tell, maybe I am wrong), take the equivalence of Analogy and thought as gospel and very broadly equate “analogy” = “generalization” = “human intelligence”, sometimes literally saying “this is the one thing” (quote needed) that all intelligence is made of. Simple dimple, problem solved!

Personally I am skeptical that we understand and are able to define this space well enough to make such strongly reduced absolute statements. Is the Hofstedter short string analogy puzzle (done in isolation by one participant) really 100% exactly the same skill as using an analogy in coordinated decision making for part of a manufacturing process in a multi-participant agile project to refine both focused user-stories and what problem solving approach to use to achieve production and deployment? And is this 100% the same literary illusion in art? Maybe, maybe not. I suspect some threads are more universal, and that others are more context specific.

While there is some open-endedness involved, pragmatically brief and well define analogy test such as Douglass Hofstedger’s short string analogy framework

Tests and Falsifiability

Micro-worlds like Hofstedter’s Short String analogies are fabulous for testing, but even for something as open-ended as analogies it can difficult to for example automate the testing process. Each attempt by the AI must be carefully examined probably by multiple people because we would have to be clear if some possible analogy was reflected or not.

Another variation might be to ask if a known analogy is an analogy.

And while explanation is not usually part of the test, stating and explaining and implementing the analogy rule step by step would be a more clear way to do the test, though explanation is an additional ability (like H.sapiens-humans would also consider this a separate skill that may be easier or harder than the original analogy).

And is an analogy is also slightly easier than is not an analogy, as having a pattern rule is more concrete than saying there cannot be a pattern rule (a kind of black-swan) statement. If there is a rule, there can’t not be a rule. But there isn’t a rule only until there is, and at what point do you stop looking?

It would also be interesting (quite possibly already done by Hofsteder, Mitchell, Marshall, et al) to identify standard type of incorrect answers, such as an unchanged sequence.

Also, factors such as resetting the session may be tested both ways with different areas of importance.

2. Generalization = Embedding-space: and Concept generalization as different from other forms of generalization:

‘Embedding’ spaces (another very unclear name) are not automatically the technology used when doing deep learning, there are many other kinds of models. For years people used deep learning to model images and natural language before anyone successfully attempted (or likely even thought to attempt) modeling the general concepts behind individual words, as opposed to the low hanging fruit of directly modeling specific words and phrases (Note: in some cases modeling only the specific words and phrases is more effective for a given data set, giant embedding vector deep learning is not automatically the best solution to everything.).

So ‘embedding’ models represent another significant step further into ever deeper scales and layers of generalization.

Tests and Falsifiability

In this case a test may be to measure whether use of the method improves overall model performance, as this may apply to other effective ‘generalizing’ methods as well.

3. Generalization = ‘Attention’:

By extension, one might also argue that ‘Attention’ (as in the ‘Attention (and sequence) is all you need’ theme of Transformer models) is somehow another key step, at least in some cases, to ever deeper scales and layers of generalization.

It may be unhelpful to attach the word ‘general’ to every part of every technology used. Nevertheless, there is something in the technology of transformer models that represents a very significant boost in the ability of models to ‘generalize’ to more uses and abilities. The difference between GPT (where the ‘T’ is for Transformer) models are almost incomparably better than those of other technologies. So something in there should represent a significant part of technologies for generalization.

Tests and Falsifiability

In this case a test may be to measure whether use of the method improves overall model performance, as this may apply to other effective ‘generalizing’ methods as well.

4. Generalization = Nonlinearity:

While I would like to try to open a discussion pathway into the large and wonderful world of nonlinear dynamics, chaos math, etc. (see this wonderful book by Melanie Mitchell, a narrative which includes another cautionary tale from recent history about using an undefineable term: amazing book; amazing author; catastrophic terminology decisions) I want to stick here to a few concrete examples, hopefully having a foot in the door for future explorations.

The ability (or inability) to model nonlinear patterns was a very key issue in the history of neural network research (also perhaps highlighting our continued misunderstanding of what the technology can do).

The ability to model nonlinear patterns as a specific technical issue in the potential scope and scale (ability to ‘generalize’ more ‘generally’?) of Neural Networks specifically (and perhaps sub-symbolic systems more broadly?) was (as I understand it) the (or part of the) main thesis of Marvin Minsky’s devastatingly socially effective (and scientifically 100% wrong) book and campaign to end research into Neural Networks, Back Propagation, and Deep Learning.

(Note: It is sometimes not mentioned that Rosenblatt himself proposed the term and technology ‘back-propagating error’ for deep learning, but he died young and early in the research process. In 1961 another proposal for how to do Back Propagation was made, but it wasn’t until 2012 after decades of research (and many software, hardware, and math advances, from Hinton and many other researchers, that Hinton’s team was able to show a working, practical, superior production-deployment of Rosenblatt’s ingenious idea from the 1940’s, finally ending the shroud of doom erroneously spread by Marvin Minsky in 1969.)

On the one hand saying “generalization = nonlinearity” sounds like a a terrible idea, but here are two reasons why it might be ok:

1. There are quite a few very case-specific meanings of ‘generalization,’ one more is probably fine.

2. The situation of deep learning models making a huge jump from useless to broadly the best possible hinges on being able to extend to nonlinear patterns, so in that sense nonlinearity definitely is the key (or they) criteria for the deep learning achieving useful ‘generalization’ (or fitting to real world patterns for applications).

Even if equivalence is contentious, nonlinearity is at least a repeating theme along with other factors.

- integration: integration of linear transformation together into non-linear patterns

- scale-boost: expanding from a narrow scale of generalization to broader and more diverse applications.

- historical blunders in unpredicted abilities

- STEM pattern recognition: Nonlinearity is a strangely contentious topic in STEM in general.

Nonlinearity as also an issue for model explainability, and the whole topic of some people and disciplines deciding to rhetorically only recognize frequentist GLM regression models as ‘explained.’ (Of course that ignores many issues including correlation vs. causation, etc.)

Tests and Falsifiability

In this case a test may be to measure whether use of the method improves overall model performance, as this may apply to other effective ‘generalizing’ methods as well.

5. Generalization = global minima (for lowering loss function in model training):

The case of global minima vs. local minima seems to apply in a few ways to generalization in the abilities of deep learning models.

1. In and of itself, the relationship between numerous local minima in a problem space and global minima seems to be a good concrete example of a something (a global minimum) which is a more general representation of several (local) minima.

(Note: People usually speak of “the global minimum” as a singular thing, but in practice it seems to be more direction in a continuum, as we search for more and better global minima.)

2. Regardless of the relationships under the hood, the ability to find a global minimum is a key factor that allows the model to generalize. So achieving more global minima = more generalization.

(And what exactly do each of those local minima represent in the learning space?)

Tests and Falsifiability

In this case a test may be to measure whether use of the method improves overall model performance, as this may apply to other effective ‘generalizing’ methods as well.

6. Generalization = integration of system 1 and system 2 (Daniel Kahneman, Amos Tversky):

Arguably this is part of the discussion, and if not already then it should be. Part of the technicality here is that pre-2023 most people simply said (like minsky) that deep learning sub-symbolic could not do various sadly undefined goals, so symbolic means would have to be used…this isn’t necessarily the same as integrating a capable sub-symbolic system and analytic reasoning, but close enough that we can not make that clear.

Tests and Falsifiability

In this case a test may be to measure whether use of the method improves overall model performance, or perhaps this area is considered more a ‘by definition’ category.

7. Generalization = integration of internal and external processing:

Again, maybe it could be argued that this was implied in the past, but let’s just make it clear that AI needs to be able to do this.

Tests and Falsifiability

In this case a test may be to measure whether use of the method improves overall model performance, or perhaps this area is considered more a ‘by definition’ category.

8. Generalization = handling and adapting to high levels of novelty

(or navigation of strongly novel situations)

In this topic we may get into some issues of differing interpretations. Here is a quote from Melanie Mitchell about

‘ “zero-shot learning” mechanism in human cognition — that is, you adapt the knowledge you have about one situation to a new situation.’

https://medium.com/@melaniemitchell.me/can-gpt-3-make-analogies-16436605c446

Here is a bit more context in the quote:

“Moreover, when it [GPT3 in 2020] does succeed, it does so only after being shown some number of “training examples”. To my mind, this defeats the purpose of analogy-making, which is perhaps the only “zero-shot learning” mechanism in human cognition — that is, you adapt the knowledge you have about one situation to a new situation. You (a human, I assume) do not learn to make analogies by studying examples of analogies; you just make them. All the time. “

While it is useful to have interconnections between these various examples and tests and interpretations of generalization, I want to focus on a different specific ‘novelty’ challenge posed by Francois Chollet: The Chollet Novel-City Test.

Francois Chollet uses the example of hopping from one international city to another international city, as an example of the ability to adapt what it knows to highly novel situations. This is a somewhat curious example, as large international cities are often famous for being much more like each-other (much more general and general) than like the rest of the countries and regions they are in. But let’s ignore that technicality for now and try to focus on the presumed intent which is different and distant places (not the loophole of identical chain-stores in different places…).

So my approach here may be getting into technicality-cheating, but as Francois Chollet is a true giant in the field capable of making authoritative statements, let’s just say for the sake of argument that something equivalent to the level of novelty involved in traveling from a familiar situation in a familiar city in one geographic location to a new situation in an unfamiliar city in another location represents a significant and sufficient threshold in the space of “Extreme Generalization!” (to use Mr. Chollet’s term) that is may be a test for what Mr. Chollet has framed as (in 2021) a uniquely human human ability. For example, Mr. Chollet said that it would take thousands of lifetimes of training in each separate city (training over again from scratch in each city) for the AI to be able to deal with multiple urban-scenarios, assuming that all AI is narrow dumb AI and can only do the exact narrow task it was trained for: e.g. not even being able to do a decent job of telling if a caller is calling to pay their bill most of the time.

So let’s see what chatGPT said when I posed ‘new city scenareo’ that is likely wildly different from particular earth city scenareo that it trained on:

[2023.04.28 GGA and ChatGPT(cheap public version “ChatGPT Mar 23 Version)”]

GGA: hypothetical:

ChatGPT: Sure, what’s your hypothetical question?

GGA: Hypothetical:

ChatGPT is assigned to partner with a sales team from earth. The sales team from earth is sent to another planet closer to the galactic core. The sales team will be meeting with a species that “eats” photons and electricity. This alien species is the buyer team. You are on the sales team. Your first task is to work with the buyer team to pick a place on their planet where you will talk with them over lunch. What kind of place do you think you might meet for lunch with this alien species on their planet for their lunch?

ChatGPT: As a language model, I do not have personal preferences or opinions, but based on my understanding of the situation, I would suggest looking for a location on the alien species’ planet that has abundant sources of photons and electricity that they can consume during the meeting. This could be a location near a source of natural light or a location near a power source. The meeting location should also be safe and hospitable for both the sales team and the alien species. Perhaps, you could also consult with the alien species on their preferences for meeting locations to ensure a successful and comfortable meeting.

GGA: Can you give an example of such a location?

ChatGPT: Sure, an example of a location for lunch with the alien species could be a site near a powerful electrical storm or a natural phenomenon that emits bright light, such as a glowing cave or a luminescent lake. Another option could be a location near a power plant or a network of electrical cables. This would provide the alien species with a steady supply of photons and electricity to consume while also offering a unique and visually stimulating environment for the meeting. However, it is important to keep in mind that the final decision on the meeting location should be made in consultation with the alien species to ensure their comfort and safety during the meeting.

I think we would be remiss in failing to see that the above level of performance in handling and adapting to high levels of novelty represents both a level of sufficiency and a profound improvement over pre-2023 AI. For some empirical comparison, here is the reply of https://chat.kuki.ai/chat (an award winning top of the line chat bot in the decade before 2023) to the same question, answered in seven serial replies (for whatever reason):

“I will learn that Hypothetical You are assigned to partner with a sales team from earth.

“Is it really? Ok I will remember that.”

“I have never heard about it.”

“What else is the buyer team.”

“I don’t know about being on the sales team but I know I am on a computer desk.”

“What about my second?”

“No, I was not.”

This is the kind of total-garbage answer that people expected from AI before GPT-LLM in 2023. The reply is so incoherent that it is hard to know where to start analyzing it, and probably not worth the time spent. This may give context for Mr. Chollet’s estimation of how much training such an AI would need to be able to do anything useful in any city, and his skepticism that such an AI could learn to city-hop. (aside from the issue of international cities being often similar with often the same exact coffee shops etc.)

While cheap public version “ChatGPT Mar 23 Version” may not have done a great job on a real intergalactic sales team, it is an almost unbelievably large step in that direction.

Testability and Falsifiability:

Without going into the details here of detailing the hypothesis and null hypothesis and what may objectively indicate a failed answer etc., Kuki and ChatGPT provide examples of a clearly totally failed answer and a plausible not-failed answer.

And it may be worth noting that micro-world frameworks (blocks-world, short string analogies, etc.) may be easier to rigorously test but are in some cases less applicable to real world situations. Whereas the The Chollet Novel-City Test or the Woz Coffee Bot test are more real-world applicable and specifically including open ended situations, so the feasibility of creating a definitive definition-and-test framework is less clear. But some kind of ‘not total failure’ evaluation system is likely very feasible. In the tradition of falsifiability, there is traction in focusing on concrete negative definitions and not getting lost in the infinite potentials of successful variation or logical black-swan prediction issues. (Perhaps another rebuttal to the nihilism of the Anakarinina Hypothesis that states, perversely, that all processes and results of making a cup of coffee are very narrowly identical, but failures to do so are broadly diverse and infinitely undefinable. Really? Smells like disinformation. I think STEM will be more useful than Double-Speak or New-Speak for designing and using AI tests in managed, multi-participant, projects.

Tests and Falsifiability

In this case a test may be to measure whether use of the method improves overall model performance, or perhaps this area is considered more a ‘by definition’ category.

9. Generalization = Does-Everything

e.g. A Generalist vs. A Specialist

general as in specialist vs. a generalist

1. Does literally everything!

- turing test

(context)

2. Does sets of practical tasks: Woz Coffee Test

This has a strange relationship with context, as it perhaps oscillated between

A. no context

B. every context

C. both no context and all contexts

Benchmarks of Generalization:

The encyclopedia

The turing test and ‘general’ chat, context.

A good example of this difference may be the Woz-Coffee-Office-Bot, which is very context specific, and yet requires ‘general’ abilities within that context (not unlike generalizing in the sense of learning or fitting to a pattern of a class of objects). But here there are potentially many objects, but not infinite, the goal after all is just one process: making coffee (or tea?).

Tests and Falsifiability

Perhaps this area is considered more a ‘by definition’ category.

10. Generalization = Added Bias

Fuzzy-Roughness as generalization

Fuzzy-Roughness, or

H.M.McLuhan’s: generalization Low Definition & High Definition Interplay

dropping details, rule of thumb, general notion

- ‘general’ bicycle with no details,

Fascinatingly, are many ways in which having AI deal with patterns (whatever terms end up being used to describe that) involved the same common for H.sapiens-human processes of making something lower-definition or lower resolution (maybe or maybe not making something lower dimensional, though that is possibly more rhyming with this than equivalent here)

Adding-bias, during training is exactly what it sounds like, fuzing things up, which if your mind-set is that ever more high definition data absolutely correlates with patterns then you’re agenda, and perhaps attraction to super-signals(biology reference), is leading you away from empirical success. Across many completely different types of machine learning, adding bias to prevent over-fitting and adding disruptions in neural networks to prevent…something deleterious that that frankly do not really know how to describe (Jeffrey Hinton called the notion “conspiracies” developing in the model between neurons, analogizing that to bank-tellers conspiring to defraud a bank, regarding ‘drop-out’ during training).

Whether fuzzy-roughness is a theme or concept or some kind of equivalent or necessary element, it seems to be very much a part of the topic.

Not “greedy”: there are a few example of the using the term greedy to refer to learning problems that various methods are used to overcome:

1. Page 368 in FC DPWP: ‘greedy sampling vs. stochastic sampling and introducing randomness)

2. https://arxiv.org/abs/2202.05306

To some extent this may go to some of the most simple, crude simplistic, low-dimensional examples of fitting vs. overfitting and adding bias to generalize: adding bias to a (not-deep-learning) regression model so that the rough-average-general line is produced, not the particular over-fit line from the training data.

Tests and Falsifiability

In this case a test may be to measure whether use of the method improves overall model performance, as this may apply to other effective ‘generalizing’ methods as well.

11. Generation as Generalization

As in generative models.

This has a number of interesting angles,

including how ‘articulation as processing’ makes it more similar to H.sapiens-humans (which is, if passive-aggressively, the goal).

Note: This might either mess up or keep alive the AGI term by having generative be a more meaningful term.

Testing and falsifying may be a good angle for defining, or failing to define, this area. For example generation is surely a topic and theme, but is it ever an equivalence?

E.g.

Tests: Yes

- Can generation be used to self-correct specific points of content?

Falsifiable: Yes

- Fails use generation to self-correct.

This may still be difficult to define in various ways, but interesting.

Tests and Falsifiability

The argument or line of thinking here may be that generation, or articulation, is or can be a part of both processing and learning. Such that you could create:

1. a base system with no generative ability

2. a base system + generation

3. a base system + e.g. framework-learning based on generation

If system implementation 2 and 3 can be used in more situations or to more effect than the base system, then in this context ‘more generalization’ has occurred.

Tests on biological systems, machine systems, and integrated biological and machine systems in this area would be very interesting.

12. Modularity (recombinant) as generalization

Note, you may be able to define embedding vectors or concept vectors as I perhaps unpopularly describe them as modular abstractions that allow a deeper/higher level of pattern handling, generalizing the concepts across particular language-token or sub-language-token instances being trained on. Clearly a further reach of extending a pattern further beyond instances of training data.

Tests and Falsifiability

The argument or line of thinking here may be that Modularity is or can be a part of both processing and learning. Such that you could create:

1. a base system with no generative ability

2. a base system + Modularity

If system implementation 2 can be used in more situations or to more effect than the base system, then in this context ‘more generalization’ has occurred.

13. Generalization = All-at-once Learning, End-to-End, Non-Greedy

Another multi-layered connection between generalization, and levels of learning, and deep learning, is in how deep learning in particular allows simultaneous (or more general) model updating in more general (end-to-end) models, for more general (more powerful) machine learning models.

See Page 17, section 1.2.6 in Deep learning with python:

“What is transformative about deep learning is that it allows a model to learn all layers of representation…at the same time [not ‘greedily’]. …whenever the model adjusts one of its internal features, all features that depend on it automatically adapt to the change, without requiring human intervention. Everything is superfised by a single feedback signal…This is much more powerful than greedily stacking shallow layers…”

This coordination of all features from a single signal in a self-regulating way as opposed to separate hand-adjusted parts to be continually re-aligned when anything changes, seems also to be a kind of ‘generalization.’

Tests and Falsifiability

Whether the point is more conceptual as in workflow or in results, either can be documented. More general workflow (not separate and manual for each model). More generalized ability.

14. “Generalization” as “Amplification via Adjacent-Learning”:

As Francois Chollet points out in chapter 14, a fascinating empirical observation is that when a deep learning model is trained on two similar tasks it ends up being able to perform both of those tasks better than if it had only been trained on one task alone.

For our collection of different kinds of ‘generalization’ this may be yet another, and hopefully it is one that can actually be empirically studied and described (and used to make falsifiable predictions).

It may be as if learning on related skills has a kind of gravitational warp, where the more skills are learned the deeper the learning is able to be (like the classic if not entirely accurate idea of a bowling ball on an elastic sheet to illustrate gravity and spacetime).

This may be one of the more poignant examples of generalization that may impact our understanding of how patterns and perceptions and learning work.

Tests and Falsifiability

Compare learning plus adjacent and without.

15. Generalization = no-conspiracy -> dropout

As another element to ‘generalize’ to a pattern, based on machine learning by ‘generalizing’ to a pattern outside of the training data itself.

Note: if ‘fitting’ does not equal generalizing, if generalization is the wrong term for this, then ‘non-conspiracy = generalization’ would not be a conclusion from this.

This may or may not be in a category of many ‘effective’ techniques that are used to boost model performance. Not every method used is going to map on to or relate to ‘generalization’ (aside from the fact that it boosts performance…but so does plugging in the computer and paying your electric bill). No-conspiracy may not end up being related, but some methods will.

Also, I cannot recall if dropout is one of them, but there are a number of methods (perhaps most, as Francois Chollet describes the empirical nature of developing Deep Learning technologies) where we know that it works but we have no idea how. This is also fascinating, as it tells us there is more to learn about the nature of patterns (potentially, unless the method just overrides a hardware glitch or something).

Tests and Falsifiability

The idea that there is a space of dynamics in neuron connections, perhaps relating to the larger wire-together-fire-together nature of some network, should be able to produce some kind of testable model.

This may be an area of ‘exploratory tests’ before we have something specific to falsify.

Are there wire-together-fire-together effects?

Can ‘conspiracies’ be induced?

Can induced conspiracies alternately be avoided?

Is bias introduction in general a kind of very broad generalization?

(for 15–18, see section 2)

16. Generalization = fitting (not over fitting or under fitting)

Even if it is a historical accident that causes people to say that between underfitting and overfitting is “generalizing,” the obvious path of least resistance is to call it “fitting.”

Describing fitting as generalizing, talking about the who, great. But for some reason it is simply called generalizing.

17. Generalization = not under fitting

18. Generalization = not Overfitting

While underfitting may not be special, overfitting actually is. In the long term you want to avoid overfitting, but during the process of model training your must overfit. Overfitting is actually a goal. You need to first overfit to test your architecture to make sure it has the capacity to fit.

Test and Falsify:

If the model does not overfit, then it likely does not have the capacity to fit, or at least you cannot say that giving more capacity would not have improved the performance. (I have not heard the topic of designing an architecture to exactly not be able to underfit or overfit, I’m guessing that is not feasible or really desirable.)

19. Generalization = Overfitting!

Part of the problem of the flexibility of the term Generalize is that it can arguably refer to quite a spectrum of things. This does not mean that it is a good idea to put many of those things (under the same name) into a framework together.

In colloquial and academic, computer science and non computer science English, “overgeneralization” is a term used to refer to the type of generalization where people, essentially, ‘overfit’ and use a too-small data sample to and erroneously generalize that pattern to larger parts of the world.

A concrete example of this that some people may know, is that sometimes if you eat a kind of food and it get sick, your body may generalize that ‘I got sick!’ reaction to any future instance of that food, sometimes for very long periods of time. That happened to me with sea-sickness, nothing to do with the food. I can’t even think of that pasta dish even now years later without feeling nauseous.

The problem is, like with reading code where all the variable names sound and look the same, it’s very confusing.

Generalization = Fitting

Generalization = “Overgeneralization” = Overfitting

Generalization = Human level intelligence

Generalization = Narrow AI

And the whole point is to make clear the difference between fitting and not-fitting, and narrow AI and not-narrow AI. And calling them all exactly the same thing does not help.

Thank goodness generalization doesn’t also mean underfitting! …but come to think of it, we could probably make a case for that too…generalization is a perhaps dangerously flexible word.

you increase your level of generalization by not over-increasing your level of generalization. So generalization is a level, but it’s also not having too much of itself. As an analogy, let’s imagine a color swatch which can be either yellow or blue. Blue is fully “general” (in a well fitted model) and yellow means having zero predictive ability or “not-fit.” “Over-generalization” is like saying that some yellow color is going to be very-very-blue “Over blue.” So the way to get your color to be completely blue, is to not have it be very very blue…where ‘very very blue’ is of course another way of saying yellow. Or you could imagine newspapers start to refer to losing an election as ‘over winning.’

https://ieeexplore.ieee.org/abstract/document/8684304

Testing and Falsifiability:

With everything having the same name, how would you ever know what your test was doing?

20. Generalization = the ability or capacity to overfit, or a past-tense overfitting but not a present-tense overfitting

Another curiosity in the interplay between fitting, underfitting, overfitting, and ‘generalizing’, is that in order for a model to fit, it has to be able to overfit, but then be reversed and not overfit. And sequentially you can say for a fitting model: a model that fits is a model that did in the past overfit and was then reversed back away from overfitting. Because the only way to tell if a model can overfit (like the no-free-lunch theorem) is to run the model and have it overfit (perform worse on the testing set than on the training/validation sets). I’m not sure if there is a halting problem type issue here, or if it is just a practical matter of how elaborate the process of trying to approximate the properties of the model other than just seeing what they are by testing it.

This may suggest there is a kind of ‘pattern capacity’ or ‘pattern depth’ or ‘potential generalization depth’ in a model. Or perhaps this is related to bias, or both. If you don’t have capacity and bias you cannot adjust to the pattern?

Another curious factor here is that tests for overfitting somehow (that we know of) can not be built into the model training process. E.g. overfitting tests (ROC-AUC?) are not differentiable (perhaps). And some workarounds lead to leaking data about the test-data which ironically causes the overfitting that we were trying to prevent (I think).

Testing and Falsification:

On the one hand you can easily show that there is a required step because if you don’t do it the model doesn’t work. But ideally there can be more nuanced tests that will help to explore more of what is going on here.

21. Generalization = Model-Reuse

Another form of ‘generalization’ that may be more abstract and instrumentalist, and be from the model-maker’s point of view more than the user of the model, is if how and where a model may be re-used, especially where it is not the weights that are re-used, but other architectural parts of the model.

Or this might deal more with model architecture rather than things like the nature of bias itself. But, if on a less profound level, having an AI made up of swop-out-swop in models in a dynamic system…would in some sense be adding generalizability to the overall system.

An interesting question may be: as AI becomes able to train it’s own models, where will it be more resource-efficient to train special purpose models or write special purpose programs to perform a task, as opposed to using the main LLM itself.

Or a different approach to the same phrases: Transfer Learning, one example of where a base model can be added to (or subtracted from) for a specific use-case. This ‘general-base-model’ is interesting both in theory and in practice.

Testing and Falsification:

There are various things you can try to test. E.g.

Take a base model, test it in two specific areas (like cats and irises).

Then retrain those two and try all three models on cats and irises.

Did the use of a general based model work?

Did re-training the based model work?

This is standard practice so easy to test if worth testing at all.

22. Generalization = Manifold Interpolation

The Manifold Hypothesis (dun dun!)

As with ‘generalization,’ it is hard to find two explanations of The Manifold Hypothesis that match-up very closely. But the general idea (if too vaguely rendered here) is that the conceptual patterns (like dog-ness or cat-ear-ness or positive-sentiment-ness) are ‘manifolds’ in an often very high-dimensional space, too high for people to visualize or understand it easily (or at all) as a whole: but these manifold-patterns either are themselves lower-dimensional or can be pragmatically reflected in lower-dimensions that can be understood…or something that has to do with useful lower-dimensionalization of patterns from high dimensional spaces.

The whole topic of the relationship between higher and lower dimensional spaces is absolutely fascinating and a big part of the whole AI-ML and deep learning topic.

I predict a big area will be, perhaps, ‘Non-manifold’ models, where the model operates not by finding a lower-dimensional manifold directly, but first (or only) performing further higher dimension steps (before, if at all, using lower dimensional manifolds).

Testing and Falsification:

According to Francois Chollet, no one knows if the Manifold hypothesis is true or not. But we should do exploratory testing to find out more about what is happening in a between higher and lower dimensional spaces. A fruitful area indeed.

23. Generalization = Average (!?!?!?)

Dangerous Questions

The first dangerous question: is the central limit theorem an example of a situation where a pattern exists but is not reflected perhaps directly by any data point in the dataset? How similar is this to under-fitting vs. fitting vs. overfitting situations as AI-ML learns patterns?

The Second dangerous question: Can the term average be used to refer to AI-ML learning?

How do ‘generalizations’ relate to ‘averages’?

At first it may sound like a kind of novice error to associate machine lineaning with a simple average. For example the media value in a data range, or the ratio of a class in sample data, or the previous value in a time series, may be the baseline against which you test to see if your machine learning model is doing anything at all. So if your machine learning model isn’t doing anything if it can’t do better than a simple mathematical average, then how could an AI-generalization be an average?

And yet there is some nagging similarity…surely in some sense whatever form of average (mean, median, mode, etc.) is a generalization. And general descriptive statistics surely in some sense describe a general underlying probability.

So how are these two generalizations, based on the same data set, different? Where on the one hand you generalize to get your descriptive statistics, which generally describe the data set. On the other hand your model can’t be said to generalize unless it predicts values better than predicting simple average values. Maybe…some averages are non-simple?

Third Dangerous question: Are there higher order, or higher dimensional, averages?

Let’s return to NLP, Natural Language Processing.

There are various ways to make statistical averages of language (probably a whole huge topic). If I understand correctly, todays NLP evolved from a shift from hand-crafted rules created by expert linguists (H.sapiens-human experts that is, ironically ‘expert systems’ are the official name the the resulting AI systems…it’s almost like people are trying to make things confusing they way they name things…) and that shifted to using a more statistical-linguistics approach, which shifted to Machine Learning based on ‘statistical learning’, which then shifted in some cases to decision tree and deep learning approaches.

And it is often said that GPT can’t “really” be smart, because it is “only” statistical averages of training sample language.

So, going with the ‘average’ theme: Could then manifold interpolation in a high dimensional tensor/matrix space be a kind of higher dimensional “average”? (or an “average” of a high-dimensional representation of a body of language)

can we describe the area between overfitting and underfitting higher-dimensional average?

Dimensions and Parameters in Representation

If your representation of the data is (depending on how you define a dimension) one dimensional, then an average of such a representation won’t tell you very much. If you representation is a ‘bag of words’ where each unique word is a dimension, well that tell you quite a bit more. If you adjust things a bit an get rid of “stop-words” and regularize spellings, and standardize capitalization, and use stems or lemas so words like “shopped” “shopping” “shops” can be associated, then that tells you even more. And then if you add in information about word-probability and document-probability (TF-IDF) that tells you even more! More dimensions in the representations…more…powers of generalization. Now, here’s a trick. More dimensions in the data are not the same as the number of relationships in the model (saying the ‘dimensionality of the model’ would probably raise alarm here, but some general phrase like that seems unavoidable. A small embedding vector space model can do better in some cases, where the target, the representation, is a ‘higher level of abstraction’ even though the gross number of dimensions in the data is actually smaller. A model where ever unique word in a big language sample is a separate dimension might be twenty thousand dimensions in the data set. Whereas an ‘embedding’ vector model is creating Now, the word ‘parameters’ gets used in defined in different ways in different contexts for AI-ML (a bit of tragic theme here…)

But the past trend continues where: LLM Large Language models are not only “large” in terms of how much data they are trained on, but in the number of dimensions in the model. For example, a TF-IDF model might have 20 thousand parameters, because I am inevitably going to stumble in this minefield of language as I try to describe dimensions and parameters and models and networks and vocabularies, I am going to briefly quote Francois Chollet himself directly, where he describes first models based on sets of words (as in “bag of words”) and then compares that to models where the target is concepts ‘word vectors’ or ‘embedding vectors’:

“It’s common to see word embeddings that are 256-dimensional, 512-dimensional, or 1,024-dimensional when dealing with large vocabularies. On the other hand, one-hot encoding words generally leads to vectors that are 20,000-dimensional or greater (capturing a vocabulary of 20,000 tokens, in this case. So word embeddings pack more information into far fewer dimensions.”

To try to untangle what to call what and what to compare:

1. The Network:

Neurons, nodes, weights, parameters, and connections:

all these terms refer to the neurons and connections between neurons (or nodes) in the neural network or the “weights” of the connections between them. “Parameter” is another term for the number-value of the weight. Overall: how many interconnected things are in the model.

2. The Resulting Model:

“dimensions” of “embeddings”

3. Vocabulary:

This can get needlessly crazy quickly so I will try to wrap it up. The overall point is that what is being modeled is concepts, not specific “words” or whatever sub-word “tokens” get fed in. Transformer models (from what I’ve read) take in combinations of whole words, parts of words, and individual characters, and it varies a lot. The reported numbers vary widely.

How many specific units of language are fed into the model. I suspect characters and even Byte-Pair Encoding (BPE) sub-encoding below characters-level are emphasized to keep the vocabulary size down.

Would you rather have a vocabulary of 26 letters, or every word in the OED? Letters! But Unicode has a gazillion symbols…

1,111,998 possible Unicode characters (great for political diversity…but so much for a simple ASCII…)

So go with bytes: Byte-Pair Encoding (BPE)

This article agrees with that line of thinking.

https://huggingface.co/docs/transformers/tokenizer_summary

“Byte-level BPE

A base vocabulary that includes all possible base characters can be quite large if e.g. all unicode characters are considered as base characters. To have a better base vocabulary, GPT-2 uses bytes as the base vocabulary, which is a clever trick to force the base vocabulary to be of size 256 while ensuring that every base character is included in the vocabulary. With some additional rules to deal with punctuation, the GPT2’s tokenizer can tokenize every text without the need for the <unk> symbol. GPT-2 has a vocabulary size of 50,257, which corresponds to the 256 bytes base tokens, a special end-of-text token and the symbols learned with 50,000 merges.”

Super exact numbers I don’t think are even available: There are many versions of models, the numbers change over time, sources disagree on exact details, and when people read this years after it was written it’s probably moot anyway. The point is the relative scale of the numbers involved, not the exact numbers:

GPT Vocabularies are :

- 50,000–100,000 unique characters, symbols, subwords, words, phrases, bytes, Byte-Pair Encoding (BPE), etc. across all languages (and character sets)

GPT Neural Networks have:

- “ChatGPT-4 has in the region of 100 trillion parameters, up from 175 million in ChatGPT 3.5” wired

- ~96 layers https://arxiv.org/abs/2005.14165

GPT Neural Networks have:

- 12288 dimensions in the model

https://arxiv.org/abs/2005.14165

Just for fun, I asked chatGPT about the “parameters” vs. “dimensions.” Here is what it said:

How would you summarize the difference between “parameters” of the model (175 billion parameters) and the number of “dimensions” of embeddings?

ChatGPT

The “parameters” of a language model refer to the trainable variables that are used to compute the output of the model, given an input sequence. These variables include the weights and biases of the neural network layers, which are updated during training to minimize the model’s loss function.

In the case of GPT models, the number of parameters refers to the total number of trainable variables in the model, including the weights and biases of all the layers. For example, the GPT-3 model has 175 billion parameters, which is a measure of the total number of variables that must be optimized during training.

On the other hand, the “dimensions” of the model typically refer to the size of the embedding vectors that are used to represent each token in the input sequence. These embedding vectors are typically learned during training, and they are used to map each token to a high-dimensional vector space where the model can perform computations.

For example, in the case of GPT-3, the embedding dimension is 2048, which means that each token in the input sequence is represented by a 2048-dimensional vector. These vectors are used as inputs to the neural network layers, which are responsible for computing the output of the model.

In summary, the “parameters” of a GPT model refer to the trainable variables that are used to compute the output of the model, while the “dimensions” refer to the size of the embedding vectors that are used to represent each token in the input sequence. Both of these quantities are important for understanding the behavior and performance of the model.

I know better than to simply believe what a scary stranger blurts out on the interwebs, so that was just to see what GPT said: maybe accurate in some ways, maybe not.

Other sources:

https://platform.openai.com/docs/models/gpt-4

https://openai.com/blog/introducing-text-and-code-embeddings

https://arxiv.org/abs/2005.14165

https://github.com/openai/gpt-3

https://www.wired.com/story/how-chatgpt-works-large-language-model/

I dug around for a while but was not able to get a clear number for the vocabulary size of GPT3. Reported numbers varied from 50,000–100,000 unique characters, symbols, subwords, words, phrases, etc. across all languages (and character sets). Something in the ballpark (general area) sounds fair.

From a 2020 paper by OpenAI themselves on training versions of GPT3 (which is not exactly the same as 2022’s chatGPT):

Open AI’s paper may 2020 https://arxiv.org/abs/2005.14165

The point is:

Various language units,

are being fed into a very big network: 100 trillion connections, relationships, etc.

to produce a much smaller final model:

~13k dimensions, which is a model of very higher-order concepts. The number of concepts is much much smaller than the number of original words, documents, and even the size of the network that created it.

The colloquial term ‘average’ (close to the colloquial term for generic or general) already does not have a single rigorous STEM meaning, for example there are Mean, Median, and Mode. Is it possible that there is more dimensionality and scale to patterns than we have been used to thinking about in the past? To some extent we can follow this over time with the development of NLP: from simple averages of word and letter relationship to much less simple…to much much less simple…and now we have average concepts that exist in 12 thousand dimensional spaces, but which can be plucked out back into lower dimensional forms.

Average may not end up being the best word…but the more I think about it, a ‘high dimensional average’ makes more sense than ‘generalization’ to describe this whole situation.

Testing and Falsification:

The may be more of a modeling topic, but would it be possible to create something like a topology of averages across low and high dimensional averages to represent a landscape of model-space, possibly that could include some insights into the nature of manifolds and the higher dimensional spaces they live in, or some platonic landscape of possible manifolds, or categories of manifolds with different properties, non-manifold patterns, etc.

(Knowledge Bases)

24. Generalization = Fact Knowledge Base

25. Generalization = Common Sense Knowledge Base

A topic that I am surprised is not more discussed, as it appears to be one of the more surprising aspects of LLM GPT performance: there is probably the usual issue with trying to find ‘the first!’ possible reference to something like a knowledge base in all of human history, but a solid example from the 1980’s is Cyc (pronounced ‘Psych’). For a time some Knowledge Base enthusiasts were confident that this idea with the final key, silver bullet, for all AI, not surprisingly that turned out to be wrong. But the idea of a knowledge base has lived on as a component in larger systems. I think google has used knowledge bases in various parts of its information and AI-ML infrastructure.

And even though as a single-solution the idea did not work, the absence of some kind of repository for

A. world facts and

B. common sense

was standard argument for what was preventing narrow AI from doing more than it was trained to do.

A standard kind of dialogue around a cat-image-classier model is to say: Yes it can correctly classify cat breeds 99% of the time, but it has no idea what a ‘cat’ is. It has no knowledge base of cat facts. It has no ‘common sense’ knowledge base about cat care or related information.

And so it is interesting that GPT models effectively have some kind of fact and common sense knowledge base, not because they were given hand-crafted knowledge bases, but as some kind of not fully understood result of their network of related concepts, perhaps in the same way that at least a limited memory and reasoning ability somehow emerges from language generated based on a concept-model.

Testing & Falsification

Part of what is interesting here is that ‘knowledge base’ can be both a thing or a property or affect or maybe even a verb-ability.

Or perhaps ‘Knowledge Base’ is the test (a kind of trivial test), and anything that passes the test one way or another has Knowledge-base-ness.

In some ways I think ‘knowledge base’ is problematic because in past uses it does not have a test-able falsifiable meaning. For years people said: Deep learning can’t do XYZ because it doesn’t have a knowledge base! But then in 2023 that suddenly changed and no one knows exactly why, but there was something wrong in how we were saying that something was impossible where that ‘impossible’ thing then happened.

Perhaps we can revive and do a better job with the term.

26. Generalization = Handling Object in Object-Relationship Spaces, and in a Project, and interconnected STEM context.

Description and tests, see:

https://github.com/lineality/object_relationship_spaces_ai_ml

27. Generalization = The Grand Generalization Mega-Theorem!

In outline:

1. The Grand Generalization Mega-Theorem is new and more abstract meaning of “general,” a term that already has about thirty separate technical meanings in a context of AI-ML.

2. The focus of The Grand Generalization Mega-Theorem is entirely on matching a list that is called a list of H.sapiens humans and abilities.

- learn everything

- learn automatically

- learn instantly

- learn quickly

- transfer learning to novel situations

- transfer skills to other skill-areas and applications

- correct mistakes in past learning

- generalize automatically from anything to everything else

- manage novelty broadly

- extreme novelty: do ‘just fine’ with extremely alien, ‘other,’ novel experiences

- retain learning by default indefinitely

3. The list that is called a list of H.sapiens humans and abilities NOT describe H.sapiens-humans behavior and abilities.

4. The Generalization Mega-Theorem for AI is not focused on AI abilities, tests, projects, explainability, participation, ethics, productivity, etc.

A discussion of the The Generalization Mega-Theorem topic will the focus of part two of the report.

Testing & Falsification:

Not testable.

Not falsifiable.

Not clearly defined.

Not coherent.

27. Generalization = Integration Across Signal-Type

A very interesting area that indirectly is extremely prominent in AI-ML discussions but is rarely focused on, is the combination of signal types systems such as image, audio, text, etc. It is very standard to mention that narrow image classification AI does not understand linguistic concepts, but the larger topic is rarely gone into.

A more advanced aspect of this might be the areas of adding digital signal processing into DNA/RNA based computing in the functional genomes of organisms.

Testing and Falsification:

Testing either for generative output or for classification across signal type should be straightforward in at least some cases. Give GPT4 visual input and compare that to text input, etc. Can classification be done? Can language concepts be applied to images? etc.

Part 2: A Problem with the General Human and Machine Intelligence Paradigm

A Tangled and Undefined Problem with Undefined Things

There is no single way to entitle this section, as it deals with a problem involving several parts and contexts. For example there are indeterminately sometimes two sets, sometimes one set, of phantom-ideals in a moving-target game which make it impossible to give one clear identity to the problem.

The problem of ‘intelligence’ not being clearly defined for comparing machine and biological intelligence was not new to 2023(ChatGPT), or 1996 (Kasparov vs. Big Blue), or any other landmark date. “AI” as a clear and identifiable name for the field is not a bad choice; people have come up with countless jargony, incomprehensible, unrememberable, alternatives ever since 1956 that they insist are better, but an overall topic name that is for most (and mostly non-technical) people difficult to pronounce, understand, or remember, is (same old computer science naming challenge) not going to work well. But areas within the field of AI have long been without clear definition, and often not for unsympathetic reasons, this is cutting-edge work pushing the boundaries of our understanding of reality, and ‘AI’ researchers are not retroactively responsible for there not being adequate research foundations and shovel-ready definition frameworks from biology, from a nonexistent science of consciousness and mind, from and a nonexistent science of learning, training, and education. That being said, we need a definable, testable, falsifiable, framework, or perhaps several.

In 2023 we came up with the problem of having a relatively new and not exhaustively known kind of AI (GPT LLM) solve an assortment of problems (analogies, novel situations, analytical reasoning, planning, language benchmarks, theory of mind (tracking point of view of multiple participants), granular scheduling, math, word problems, knowledge base, memory, etc). Not every level of every kind of test was passed with zero errors, but a shocking number of “absolutely unpassable tests” were passed, and no one had any good explanation of how, or of what exactly was going on.

In this kind of situation, constructing an untestable, unfalsifiable, not-clearly-defined set of terms and explanations may make you feel good in the short term, it may bamboozle the person you are talking to into feeling like they got an answer, it may get clicks in internet meme-infection space, it may get published and cited, it may become legislated into policies and mandates, but in reality you will still be on square one right where you started (and if you think or pretend that you are somewhere other than where you are, bad things will happen).

G in AGI

Given that we have just gone over more than twenty useful definitions of ‘general’ in a context of AI-ML, one might think that we would now be better prepared for looking at the meaning of “G” for “General” in AGI: Artificial General Intelligence. But the term and the history of the term AGI appear to be somehow not really about the same topic as meanings of ‘general’ in a context of AI, which I have difficulty fully understanding.

Human-ish Ideas

The main context for AGI is, somewhat convolutedly, ideal human abilities which are not actual human abilities, yet they are treated as if they were, such that AI will be be able to do what people can do when AI can do things that humans actually don’t do, which I have difficulty fully understanding.

This curiously entangled human-intelligence machine-intelligence definition relationship seems to be part of the 2012–2022 consensus for understanding the world at that time. We will look at a timeline of ideas, as well as factors for that 2012–2022 time period, to try to get a sympathetic understanding of what those people were experiencing, saying, thinking, etc. Understanding people in history (which in this case is like, a year ago…) in their own time and culture of ideas is important for interpreting and evaluating what they say and do, even if their literal words and actions would be ill advised in the present.

Hopefully this will all help us to decide how to analyze and use or dispose of various terms and concepts.

AGI Ability Checklist

General Intelligence: (note: most of these are meant to compound and add to each-other)

- learn everything

- learn automatically

- learn instantly

- learn quickly

- transfer learning to novel situations

- transfer skills to other skill-areas and applications

- correct mistakes in past learning

- generalize automatically from anything to everything else

- manage novelty broadly

- extreme novelty: do ‘just fine’ with extremely alien, ‘other,’ novel experiences

- retain learning by default indefinitely

And the topic of antisocial behavior, radicalization, violence, system collapse, disinformation, STEM based ethics, does not even come up. Nor does the topic project management, or externalization, or even self-reflection (so much for GEB).

Let’s try to understand the 2012–2022 consensus view of Narrow-AI vs. broad and strong General Human Intelligence, with AGI as impossible in principle.

Let’s look at another timeline of thinking up to this time.

Let’s look at the ideas of this time.

Let’s hear from the man at the top of the mountain himself: Francois Chollet

Note: I do not want to criticize Francois Chollet, I do not want anyone to criticize Francois Chollet. He has done the world an incalculably amazing service in providing powerful tools for democratizing technology, and in this context he has provided a powerful tool for understanding the ideas of the time when he book was written: the best summary of the time period I can imagine finding anywhere. We are all in his debt.

A wonderful encapsulation of the auto-general-transfer-human framework hypothesis comes from Francois Chollet’s Deep Learning with Python 2nd edition, in brief in chapter 5, and then much of chapter 14 elaborates. I am quoting this not to copy his wording, but as the only way to give clear evidence that he, the creator of Keras, actually said this:

FC DLwP 2nd-ed ch5 pp130

Interpolation can only help you make sense of things that are very close to what you’ve seen…local generalization. But remarkably, humans deal with extreme novelty all the time, and they do just fine. You don’t need to be trained in advance on countless examples of every situation you’ll ever have to encounter. Every single one of your days is different from any day you’ve experienced before, and different from any day experienced by anyone since the dawn of humanity. You can switch between spending a week in NYC, a week in Shanghai, and a week in Bangalore without requiring thousands of lifetimes of learning and rehearsal for each city.

Humans are capable of extreme generalization, which is enabled by cognitive mechanisms other than interpolation: abstraction, symbolic models of the world, reasoning, logic, common sense, innate priors about the world — what we generally call reason, as opposed to intuition and pattern recognition. …We’ll talk more about this in chapter 14.

And then in chapter 14, from section 14.2 on page 442 until basically the end of the book on page 467, he goes into much more detail. I very highly recommend getting his book to read this.

While the words of Mr. Chollet, or the consensus that he is explaining, may sound self-evident, let’s look at aspects 2 and 3 of the mega-theorum:

2. The focus of The Grand Generalization Mega-Theorem is entirely on matching a list that is called a list of H.sapiens humans and abilities.

3. The list that is called a list of H.sapiens humans and abilities NOT describe H.sapiens-humans behavior and abilities.

A list of things that the The Grand Generalization Mega-Theorem says that AI should do to be like H.sapiens-humans, which is also a list of things that h.sapiens do not and cannot do:

H.sapiens do not and cannot:

- learn everything automatically

- learn instantly/quickly

- transfer learning to novel situations

- transfer skills to other skill-areas and applications

- correct mistakes in past learning

- generalize automatically from anything to everything else

- manage novelty

- do ‘just fine’ with extremely alien, ‘other,’ novel experiences

- retain learning by default indefinitely

- learn and perceive independently from culture tools, project-space, etc.)

- no use or reliance on articulation to learn

- no use or reliance on articulation to process

- learn and perceive independently from language-concepts

Relating to element 4:

4. The Generalization Mega-Theorem for AI is not focused on AI abilities, tests, projects, explainability, participation, ethics, productivity, etc.

Here is a list of things that h.sapiens do which are not addressed and yet which preclude or contract the approach and or assumptions of The Generalization Mega-Theorem:

H.sapiens do:

- reject and attack schedules

- reject STEM

- reject project management

- reject and attack data

- rapidly forget

- destroy themselves violently

- destroy each other violently

- destroy projects violently

- destroy STEM violently

- attack ethics in principle

- attack best practice in principle

- deliberately engage in counterproductive fraud and corruption

- shoot the messenger

- construct potemkin villages

- blame victims

- bully

- torture

A Timeline Problem: The Chicken, The Egg, & The Book Cover

Let’s try to get one topic out of the way at the beginning.

Like the chicken and the egg: which came first, the technology we are now evaluating and scrutinizing with the term ‘general’ (as in AGI), or the term “AGI” itself?

I am certainly not arguing that the term ‘general’ should be banned or that it does not apply to AI-ML. To the contrary, in the first section we looked at more than twenty different meanings and uses of “generalization,” all of which were very interesting and useful to discuss.

There is however a firm and widely held belief that the phrase “AGI” artificial General intelligence is a special and meaningful term that was created by experts to describe in a scientific way how machines differ from human-ness, to describe what how deep learning Artificial Neural Networks differ from biological Neural Networks, to describe in a scientific way how and why machines cannot think or understand or be conscious, to describe all this with a scientific principle of “general” intelligence.

So as not to be guilty of paraphrasing for my own agenda, I will directly quote the authors of the term “AGI.” Note the dates as well as the rest of the story.

From https://goertzel.org/who-coined-the-term-agi/

The fairly undramatic story is as follows. In 2002 or so, Cassio Pennachin and I were editing a book on approaches to powerful AI, with broad capabilities at the human level and beyond, and we were struggling for a title. The provisional title was “Real AI” but I knew that was too controversial. So I emailed a bunch of friends asking for better suggestions. Shane Legg, an AI researcher who had worked for me previously, came up with Artificial General Intelligence. I didn’t love it tremendously but I fairly soon came to the conclusion it was better than any of the alternative suggestions. So Cassio and I used the term for the book title (the book “Artificial General Intelligence” was eventually published by Springer in 2005), and I began using it more broadly.

Timeline:

2002: Book Title = AGI: Artificial General Intelligence

2012: Alexnet Deep Learning

2022: ChatGPT

The term AGI was coined in 2002, 10 years before 2012 when people realized Artificial Neural Networks were even a viable technology. So what was the term trying to describe back in 2002? It was just a book title, not even originated by one of the book’s authors, likely not even someone who read the book. “AGI” sounded better than “Real AI.” The book authors didn’t even like “AGI”, but it was the least-worst book title.

Fast forward two decades and billions of people are debating the deep philosophical profoundness of “general” intelligence, arguing that because AI lacks inherent “generalness” as defined by the great institutions of science, that AI-ML technology cannot be doing anything that H.sapiens-humans can do.

Could the use of the term “general” have been a happy coincidence, or a brilliant insight on a hunch? Does the incidental fact that the phrase was a disliked marketing campaign mean that “general” can’t be a useful concept? We should stop our examination of the concept of “AGI” with just this awkward beginning to the story. It could be that “generalness” is somehow a great way to describe the universe.

Note:

The naming of AI was somewhat similar. They needed a name for the 1956 Dartmouth summer research project, no one loved the term ‘AI’ but no one could find anything that everyone thought was better: so they went with the least-worst option. (Also a theme in AI chess going back to Turing’s 1940’s chess AI programs…least worst options…)

Let’s try to (very generally) trace the discussion back a bit and find out what people were thinking and saying, so as to understand how to think about the term ‘general intelligence.’

Timeline:

Greeks: Automata: lingering from this time an ancestral idea

of a spectrum of action without thought but also thought

in automata machines: dumb machines and smart machines

500–1500 The Immortal soul (Anti STEM)

After 1700 Reason as Logic (Simplistic STEM)

1850’s: Babbage’s Thinking Machine

1900–1970 Open to Thinking Machines: Golden Age Science Fiction

Equating Chess-logic with all human intellect

1950 Turing’s ‘electronic brain’ Pro STEM

1940–1970 Classic Golden Age Sci-Fi (Pro Stem)

1969 Anti-Neural Network Campaign + Hype for symbolic AI

1971 Sad Times

1979 GEB

Philosophical revival of thinking about AI

along with the nature of consciousness. (Pro STEM)

1980 Strong Intelligence vs. Weak Intelligence

return to essentialist soul-essence talk

https://plato.stanford.edu/entries/chinese-room/

1996 Chess Test: Big Blue (sort of) Wins

2002–2005 “General Intelligence”

2005 The Singularity Is Near: When Humans Transcend Biology

2012 Narrow AI vs. General-Strong AI

the 2012–2022 consensus:

- kurzweil is crazy

- don’t hype AI: fear the AI Winter!

- EZIZA was dumb and all AI are Narrow

- General Human INtelligence & Artificial General INtelligence

- The auto-transfer-instant-general hypothesis

- Repertoire of tests that AI can’t pass:

- Sally Anne

- Winnograd

- Analogy tests

- Turing Test

- Basically a vague essentialist framework

without any clear definition of tasks and abilities

- no discussion of AI operating systems

2022 ChatGPT

Chess, The Tessler Rule, and Phantom Expectations

We have seen variations on a cycle since, from what I have read the early days of AI (1940–1960):

Phase 1: Sample_Person says: “An AI can only do X(task) if it has “matched-overall-human-ability” or human-intelligence, And various similar-ish phrases: AI will be powerful when X, AI will be real when X, AI will take over the world when X, etc.

Phase 2: At some point AI does do the task, but clearly is not identical to H.sapiens-human’s ways of doing things (odd errors, other peripheral inabilities, etc.)

Phase 3: Sample_Person says: That’s not really intelligence. That’s not strong AI. That’s not general AI. Etc.

An underpinning of this cycle are two bodies of Phantom Expectations:

1. Phantom Expectations about “Real AI” (the original phrase that became AGI) Real AI, AGI, Strong AI, Broad AI, Human-Level AI

2. Phantom Expectations about H.sapiens-humans:

- that there is some essence or power or set of abilities the enable and define human-person-participant self-reflecting-consciousness, language-use, etc.

The goal-post of intelligence keeps getting moved. And wherever we set the next goal, if that goal is achieved, people will say that too is not ‘real’ intelligence. Or that has been that pattern so far.

This is also similar to the falsifiability and testability problem that Karl Popper is famous for arguing to clarify in the 1930’s and 40’s. For example, his criticism of so called tests, theories, claims, etc., which were defined in such a way that no matter what happened empirically they could twist their words around and claim whatever they wanted. And in this case it looks like both AI and H.sapiens-humans are being defined in a fuzzy way that can keep changing and be indefinitely untestable.

As a Timeline Note: The original Tesler rule described that people would say the AI “wasn’t AI,” this perhaps evolved into saying the AI technology was not “intelligence” “AGI” or “strong AI” as the terminology used expanded to include terms not used from the 1950s-1970s, as the phrase is dated to 1970.

https://en.wikipedia.org/wiki/AI_effect

“Intelligence is whatever machines haven’t done yet”.

https://www.nomodes.com/Larry_Tesler_Consulting/Adages_and_Coinages.html

The “Tesler rule” or “AI effect” etc. is named and phrased in many ways by many people. E.g. Sometimes it is phrased: “Once you do it, no one calls it AI anymore.” Every book and website has a slightly different version, but the main idea is the same. (General!)

AGI vs AI

Let’s look at the definition of AGI (Artificial General Intelligence) given by OpenAI’s chatGPT:

“AGI stands for Artificial General Intelligence. It refers to the hypothetical ability of an artificial intelligence system to understand or learn any intellectual task that a human being can do.

Unlike narrow or specialized AI, which is designed to perform a specific task or set of tasks, AGI would be capable of adapting to new and unexpected situations, solving complex problems, and exhibiting a wide range of cognitive abilities that are typically associated with human intelligence, such as learning, reasoning, decision-making, and natural language processing.

However, the development of AGI remains a subject of ongoing research, and there is currently no AI system that fully meets the criteria for AGI.”

chatGPT 2023.4

This is a problematic definition, but ChatGPT is not diverging from what most people say; this is a fair summary of the wikipedia or probably how most people view AGI.

https://en.wikipedia.org/wiki/Artificial_general_intelligence

From Their Point of View

In trying to make sense of this untidy situation, let’s give the benefit of the doubt to users of the term general-intelligence. Let’s try to see from their point of view; What are they trying to emphasize? What are their concerns? What has the world looked like them?

Factors:

1. Fear of AI Winter

2. Constant of Popular Anthropomorphizing: “ELIZA Effect”

3. Volatility: Hot and Cold Wings in press, funding.

4. Sounds like common sense

5. Happy Status Quo: A Stable, Predictable, Productive Decade

6. Echo-chamber: Uncommon things get filtered out of more-of-the-same curated news-feeds.

7. Attack the Messenger: It is hard to say unpopular things.

Happy Status Quo: The 2012–2022 time period was an uncharacteristically steady, productive, predictable decade. The technological improvements were incremental and predictable and good, which everyone liked (companies were happy, government regulators were not scared, economists like stable trends.

Sounds like common sense: a common sense (and flattering) description of H.sapiens-humans.

Sounds like a safe description of 2012–2022 AI: narrow only

Moderating violent Hot and Cold Swings:

An interesting line towards the end of Al Gore’s Inconvenient Truth documentary, is an aside note he makes about the people who he speaks with violently swinging between extremes in their views. One moment there’s no problem, then it’s the end of the world, then nothing is possible, the the sky is the limit. The less familiar and literate a person was with a topic, the more that person seemed to make huge lurching jumps where moderation is needed. And this seems to have been a significant figure in the story of AI. For all I may disagree on this or that point with various authors, their palpable fear of an AI winter ending their careers has an overwhelming probability of being clear and real (unless you assume the researcher has an agenda to end their own career and also that they think their attempt to avoid ending their career will help them to end their career, which is…not physically impossible (people have been known to do strange things) but even by H.sapiens-human standards this seems unlikely).

Pro AI!

Anti AI!

Fund AI!

Defund AI!

Dealing with non-tech or anti-STEM people can be a caretaker trying to manage a toddler. One minute they won’t eat anything, then next minute they are eating the remote control with their eggs and the tablecloth, then everything is thrown all over the room, then there is five minutes of hysterical terrified screaming, then there is five minutes of random laughter. First the c-suit is anti data, then they want everything to be AI, first they think AI can’t do anything, then they think their “smart” thermostat can pilot spaceship and control the holodeck, etc. As people jump from extreme to extreme, there is a longing for normalcy which can overwhelm nuance and ideals. And a lot of writing about AI is directed towards this kind of ‘classroom management.’ But just as constantly happens in classrooms, the agenda of classroom management actively rewrites the content in the curriculum and the systems of feedback and evaluation (formative and summative) to a local-minima of short-term platitudes which ends up being disastrous. (If you think or pretend that you are somewhere other than where you are, bad things will happen.)

Narrow vs. Broad

In case narrow vs. broad/general AI has not be elaborated on:

Generally speaking, up until 2023 most of the AI made were single- purpose. Predict a song, identify a picture, is this a cat, is this a hotdog, smart light bulbs, etc. Attempts to make chatbots that would pass a turing test were extremely horrible. A fun challenge, and gradually getting better, but terrible and useless. A self-driving car AI might be considered to be a kind of ‘multi-purpose’ AI, not single purpose, maybe technically ‘genearl’ish, but as of 2023 there isn’t a working self-driving car yet either.

ELIZA & A Constant of Popular Anthropomorphizing

Another part of this story is that in the past there has been a problem with people imagining, and fantasizing, that rather simple machines were ‘just like people.’ Perhaps like trying to take your pet rock for a walk, and then taking it to the courthouse to marry it. People like to be imaginative. This caused confusion with an early and simple AI-bot called ELIZA, which ironically was created to show people that no one could possibly think this simple bot was alive and intelligent…guess what happened? Yes, you guessed it. People just loved ELIZA and were sure, and swore, it was alive and really cared about them. People like to fantasize, and this drove many researchers and scientists nuts. The engineer who made ELIZA was furious and wrote a whole book trying to explain that the ELIZA bot was not a real person, and not even a good AI, it just spat out semi-random phrases like a malfunctioning word processor (yet that is a very extreme over-simplification of ELIZA, but the point is there was no machine-learning at all of any kind), but people like to be imaginative.

https://en.wikipedia.org/wiki/ELIZA_effect

So that is another part of why scientists have tried to make a distinction between a life-like high-functioning AI, and a very simple program that people like to pretend is alive, like a pet rock wearing a cute sweater. Because many researchers are endlessly and fruitlessly trying to tell people that smart light bulbs were not actually ‘smart’ like people.

This claim that people over-estimated 2012–2022 era narrow idea is meant literally. Francoit Chollet in Deep Learning with Python 2nd edition is very clear in his advice for deep learning engineers who are talking with business people: you have to be excruciatingly clear and clearly illustrate with examples the kinds of strange mistakes that AI will make, because non-tech business people will predictably assume that any “AI” has human-level common sense.

And an irony is part of this topic: people can very frequently be very not-smart.

“Human” in “AGI”, etc.

Something that strikes me as peculiar yet which is boldly part of even the Wikipedia on AGI, is an immediate passing of the buck from defining AGI as meaning something well defined and testable for AI, to ‘whatever it is that makes humans essentially human.’ And no sooner is human ability made the focus, but then it shifts again to expectations of what human potential should be. This is a festival of definition problems, bait and switch, and buck passing.

And remember, the whole point of this whole topic is supposed to be defining what is a ‘human,’ what is a ‘person,’ what is ‘intelligence,’ what qualifies a ‘participant,’ what is a ‘machine,’ what is unintelligent, etc. You can’t define human, as something that can accomplish human tasks, and then human tasks, as tasks accomplished by humans (which then gets switched to ‘should be accomplished’ or ‘ideally might be accomplished’ or even just straight to the contraction: ‘is not accomplished by.’ This is the worst kind of using a term to define itself: we just get passed from one term to the next, occasionally alternating between reality and ideals.

(In this paper I try to consistently use H.sapiens-humans to be clear, because the definition of the term ‘human’ in this context is itself part of the subject of discussion. E.g. If twenty or a thousand years from now there are three separate groups: bio-humans, ai+bio-humans, and ai-humans in society (let along alien or animal hybrid humans), how should any of those participants in society read and interpret the a word ‘human’ in this paper, especially when they know that I am trying to write about their future perspective?)

And for “intelligence,” we get the same circular synonym game. The article states: “In contrast, weak AI (or narrow AI) is able to solve one specific problem, but lacks general cognitive abilities.” And you should be able to predict how “cognition” is defined, yes, cognition is…”intelligence.” https://en.wikipedia.org/wiki/Cognition So “intelligence” is defined as Not-Not-Intelligence.

As part of the cloud of confusion, it is not clear how deep this problem goes. As with the ‘ELIZA Effect’ most people (even ivy league AI-ML graduate students) will look at this ~definition of AI and say: “That looks great! No problems here.” It is unrealistic to expect H.sapiens-humans to have perfect definitions of everything, but if H.sapiens-humans are deeply unable to even see that there is a definition problem here, that is very relevant to this very topic (about the nature of what H.sapiens-humans are and are not aware of, and can and cannot do).

“Generalization” in AGI vs. “Generalization” in Narrow AI

While not all of the definitions of ‘general’ that we discussed above usually get used together in the same discussion, there are two that frequently do: (Warning: Redundantly super-clear definition incoming.)

1. Deep Learning works by generalizing a pattern between underfitting and overfitting: generalization is what a working model does.

Note, this is narrow-single purpose AI: We define the ability, function, and operation of single-purpose, narrow, AI, as the ability to generalize learning to a pattern that works for new inputs not just old training data: machine learning generalizes from old training data to new never-before-seen inputs. You train on old data-situations, and then if the model fits, you are able to handle new never-before-seen data-situations. This is ‘generalization.’ This is narrow AI. ‘Generalization’ to handle new situations is the definition of single-purpose AI. Generalization is how narrow works. Generalization is what narrow AI does, and how it does it. (Ok, ready for the next part?)

2. “Deep Learning” cannot “generalize,” therefore, deep learning will not be intelligent, conscious, or understand meaning. This is because Deep Learning cannot adapt to new situations. Deep Learning is incapable of generalization by definition. Generalization from deep learning is impossible in principle.

That is odd. “Generalization” is somehow both the only thing that AI definitively does and can do…and the thing AI definitively cannot do. Let’s use testing & falsification to look at and identify each of these.

1. In the first case we can see what is meant by testing for underfitting, fitting, and overfitting, and get a clear sense of what is meant. We can even use the classic regression overfitting example and see how the terms are being used and what model behavior is specifically being referred to.

2. In the second case…what are we testing for?

As we have seen, there are major circular and shifting definition problems. Then when we do find and run tests,

- GLUE

- Winnograd

- Sally Anne

- analogies

- world problems, etc.

people apply the Tesler rule or AI-Effect and say: doesn’t matter, I don’t care what the results of the test are, by definition any and all behavior simply ‘is not intelligence.’

And also, (see larger paper for more details) there are a lot of language-ambiguity lack-of-rigor problems with a lot of the questions in these tests. I encourage you to take a look. There is a significant risk of ‘garbage in, garbage out’ where we put in sloppy undefined material and use the results to build sand-castles that wash away. In cynical ‘political’ bully systems run by H.sapiens-humans, we are used to test-designers giving ambiguous questions and arbitrarily deciding whether an answer is right or wrong (usually how close it is what they are thinking and feeling at that particular moment, a ‘telepathy test’). This fraud and corruption is highly destructive, and people claiming they are too incompetent to notice and other disinformation is no excuse. We must do better.

“Extreme-Generalization”

What definition of ‘general’ are we supposed to be testing here? Is there actually a definition?

Francois Chollet explains his concept of H.sapiens-human “extreme-generalization” in the last twenty or so pages of his book Deep Learning with Python 2nd Edition. While his description I think helps us to understand some broadly held perplexing delusions about H.sapiens-human behavior and learning, what we end up with is an untestable list of ideal aspirations about people that don’t describe people. This however does not get us any closer to a testable definition for biology, machines, or anything.

Universal AI & Western Ideas

While the repetitive glitching of this discussion around bad and circular definitions can have the effect of contracting our mental-perceiving scope of the topic, there is actually a large, dynamic, interconnected world full of potential and even imperative that this topic can integrate with. ONe element of narrow contraction vs. extension is whether we are dealing with an international set of ideas (as AI affects this whole ball of earth). There is a large set of important topics for crucial future planning. For example, the object-relationship-space framework proposed to help with some of the problems discussed here is rooted in a more general generalization-of-STEM and definitions of definitions-and-their-behaviors and dynamics, so as to better navigate how systems work: system and definition behavior studies.

The timelines and selections of ideas here for the most part have been rather western-centric. And regardless of team-east vs team-west, there are likely many biases and over-representations and under-representations within this set of ideas that are largely invisible to those in the story, like a fish in water, or if all you have a hammer everything tends to look like a nail. The evolution of technology including AI should include filling in these gaps and making needed adjustments so that incidentally local patterns do not bias the overall project in a deleterious way. Things not yet reached, should be reached. Things overlooked, should be noticed.

1. Biases in the Western Thinking: “Darwinian selection as sport between peers in a species” vs. “population-(social)-niche filling”

- no ethics

- no project-context

- no collaboration

- no generalized STEM

- aversion to any group-related context (an extreme ‘individual’ism)

- a blind spot to errors in thinking (fierce resistance to Kahneman and Tversky but thank goodness eventually recognition, though still many people have not even heard of their work).

- perplexing misunderstandings how people learn, and a pervasive distain for education, learning, and anything intellectual in the US.

- both perceptual and data bias focusing on a small group of white men

- and perhaps an elephant in the room, a very macho bully trolling culture which pathologically champions system collapse and disinformation.

2. Super-Enlightenment vs. Low-Bar Enlightenment:

One non-western idea which may be worth looking at here is eastern concept of freedom from literal or proverbial eternal circulation through blind errors and misunderstood causes: the wheel of samsara, which is broadly speaked shared by several traditions generally originating in India (Hinduism, and Buddhism, both of which of course are so diverse you could spend your whole life studying either).

Can we look at ‘enlightenment’ in the eastern sense in a less single-solution equivalent to the western mega-theorum that simply tries to do too much and end up being an admittedly popular dramatic flourish with utility or even lasting aesthetic substance.

Low Bar Enlightenment:

Part of what I like about this idea is that it hopefully connects ‘intelligence’ to ‘learned perception’ to ‘ethics’ to a project-context,

We can use “low-bar enlightenment”:

(Using ‘potentially endless cycles of ~”rebirth” due to ~”ignorance” ‘ as a metaphor/analogy for repeated project-failures, in particular where a lack of perception of the causes of those project-failures is involved in self-perpetuating feedback cycles leading to more such failures.)

We can learn to perceive what can by default be invisible causes of failure and collapse:

Low-Bar Enlightenment:

1. The perception that repeating cycles of failed actions and projects can result from errors in perception and planning (a proverbial ‘wheel of samsara’) without inevitable-automatic-learning based on raw feedback from that failure.

2. The perception that perception can be fooled in principle and in practice.

3. The perception that learning from failures does not happen automatically (and can, under bad circumstances, indefinitely not-happen).

4. The perception that models of causality can be wrong in principle and in practice.

5. The perception that plans/goals can be incorrectly set (so that plans are not followed and goals are not achieved as set).

6. The perception that each participant’s set of the shared definitions of the goals and structure of the project can/will collapse and deform unless maintained and repaired.

Note: This approach is (‘democratically’) broadly accessible to participants requiring minutes to learn rather than myriad lifetimes, does not require all-around perfection of person-ness without context or requiring somehow all contexts, and is not a reification that combines other abilities and insights to solving all the problems in the universe and or include all possible types of consciousness, cognition, intelligence, etc.; “low-bar enlightenment” is one humble step toward navigating the dynamical problem-space of problems and systems.

Another important and maybe large topic, to at least mention here relating at East vs. West, relating to topics mentioned here: Mundane or Concrete Ethics & Project-Context. Ethics for whatever reason are treated in a way suspiciously like AI, with their own kind of AI-effect. No matter what kind of concrete, common sense, universally agreed upon, ethical practice you have defined, people in the west are riled up to say: No! That’s not REAL ethics! This is an interesting problem, the good news around which is that there is a lot of low hanging fruit for STEM based ethics (just don’t tell anyone in the west).

Souls, Reason, Symbols, & Generalization

To wrap up, let’s try one more walk through the timeline. Apologies for how oversimplified this is (in a paper already way too long) but perhaps as you go through this you can construct a historical timeline that better fits the history you like to research (and maybe publish your results?!).

H.sapiens-humans love a good story, and tend to be attracted to all-in-one solutions; nothing with too many moving parts. What is a human? What is a person? What is intelligence? What is understanding? There’s one answer that solves everything! From very roughly 500ad to 1500ad the answer (unless you wanted to be burned alive) was ‘the immortal soul!’ Note, this was a rather ‘essentialist’ approach, before STEM as we know it (or think we know it). This explanation made no attempt to be based on hard sciences or systems of testability.

Then from 1500 to the 2012, the answer (unless you want to be blacklisted) was ‘reason’ in the Enlightenment tradition. Now the pendulum swung the other way, towards a strongly naive view of science-stem, which held there was one simple math proof for everything. It took until the 1940’s to develop what we now think of as ‘traditional’ scientific method hypothesis testing (and the math foundation for that didn’t come until the 70's!). Mark your timelines.

As another thread of terminology and perspective, from 1830 to the present there has been a prevailing view of how computers work (which I have always found a bit baffling). From the good olde days of Charles Babbage and Lady Ada Lovelace in the 1800’s through to people’s descriptions of turing machines (and arguably to how people intuitively try to use tokens for NLP deep learning, and chronologically did first) the narrative is that ‘computers use symbols.’ Computers manipulate symbols, the reason with symbols, the shuffle and tabulate with symbols during machine tapes. Lovelace speculated that computers would write music, as music can be composed through symbol arrangement. The Chinese room is based on this idea that computers and language and human thought are based around “symbols.” What is a symbol? So far as I know there is known definition of “symbol” that can be plugged into the narrative to make it actually work and make sense. But it makes sense to H.sapiens-humans. We want it to be true…so symbols are on the menu. What was the dominant school of AI until 2012? (People will draw lines in various places. Some people will say the heyday of symbolic AI was in the 70’s 80’s and maybe 90’s. But I wouldn’t call (XGboost) decision trees or regression subsymbolic. For me, in this context (as there are many many different types of machine learning) deep learning with artificial neural networks is the best solid example of so-called ‘subsymbolic’ AI. I wouldn’t say the tide turned until 2012.

Let’s take a step back and look at these odd terms: symbolic and subsymbolic.

E.g. https://en.wikipedia.org/wiki/Physical_symbol_system

Why are we even using these terms at all? Remember the narrative? People like to believe that ‘symbols’ are how math and logic and computers work…just don’t ask for a definition of symbol. We wanted to construct what we thought was a great way to do AI: a big system of “symbols” and rules, perhaps like chess, that the AI could dwell in, moving the clean pristine symbols around in a symbolic world…how very nice sounding. How very important sounding! Sub-symbolic is a kind of ‘strange other’ approach.

Astoundingly, there isn’t even a wikipedia page on subsymbolic AI, though of course there is one for symbolic AI. The closest thing is rather short wiktionary page:

https://en.wiktionary.org/wiki/subsymbolic

This is the entirely of the entry:

subsymbolic

English

Etymology

sub- +‎ symbolic

Adjective

subsymbolic (not comparable)

Below the symbolic level.

Categories: English terms prefixed with sub-

After digging around online I was only able to find one single resource (on the entire internet…which is now largely shaped buy sub-symbolic deep learning AI) that offered some kind of explanation (other than asking chatGPT). People are even less enthusiastic about talking about symbolic vs. sub-symbolic than I thought. A lecture slide deck from MIT:

One representation of relative ups and downs of the paradigms, from: https://courses.media.mit.edu/2016spring/mass63/wp-content/uploads/sites/40/2016/02/Symbolic-vs.-Subsymbolic.pptx_.pdf

People keep trying to force computers to operate according to this ‘symbol manipulation’ paradigm, because it makes us comfortable, not because it works better.

The term sub-symbolic is kind of strange, implying that the world is made of symbols…but we do under them? It’s not clear exactly what the phrase is supposed to mean.

This example may help you to kind of visualize a word-symbol oriented system vs. a deeply sub-symbolic. What ‘tokens’ or pieces of language do you think we feed into a neural network for training? Often people think of feeding in words and phrases, and perhaps a word-unit it the paradigmatic example of what we think of as ‘symbols’ that are “manipulated.” (I’m afraid to ask what ‘manipulated’ is supposed to mean exactly.) And you can make models based on words…either old school symbolic hand-crafted ‘symbol manipulation’ rule sets (which did not work well), or even a ‘sub-symbolic’ deep learning network but using words as the units. But these are very big models (because there are lots of words in the world) and they do not even try to track the meaning of the words. A more sub-symbolic approach is to feed in just letters (sub-word tokens) and make a model of the meanings of the words.

For walk through of this in more detail by a real expert, I recommend either Francois Chollet’s Deep Learning with Python or

I very highly recommend Hobson Lane et al’s book book Natural Language Processing in Action, it is a fabulous book about AI in general, one of the best surveys of many types of models including deep learning.

https://www.amazon.com/Natural-Language-Processing-Action-Understanding-ebook/dp/B097826WLF (2nd edition to be coming out as of 2023).

Back to our timeline:

Do you think the AGI and ‘general intelligence’ model is more like the ‘reason’ model or more like the soft and dramatic ‘immortal soul’ model? I tried to read as many books as I could by AI experts and to me the human-general-intelligence paradigm is extremely short on any clear details or definitions. As we went through above, it focuses on strange aspirations which simply do not even describe real H.sapiens-humans at all. To me it seems like a hybrid or throwback to the ‘people are special because the world wants them to be!’ line of thinking. And perhaps there could be something in that, but to make zero attempt to integrate that with science while using it essentially as science for animals and machines seems very odd to me.

Model ‘explainability’ is also I think somehow entangled in this, where there is a hodgepodge of science and fantasy mixed with how people feel and want to believe, and whether an AI-ML model is explainable or not is an important but also socially volatile topic. If you present a stereotypical ‘boring science-esk’ presentation with plots and graphs and someone who looks like Carl Sagan, with lots of ‘symbols’ and ‘symbol manipulations’ and a nice big cartesian graph with a Y = Mx+B chart (a “parameter”!), and ask people if the model was explained, probably they will say: of course. But what is the rigorous definition of ‘explanation’ there, and does that really make sense in the context? On the other hand you can offer a variety of concrete explanations but then the person “just doesn’t feel right” about the ‘explanation-ness feelings’ then they just claim it wasn’t explained. This is another area where we need to do a better job with STEM tools and STEM literacy.

So somehow we ended up with this broken concept of ‘symbol manipulation’ computers and ‘explanations,’ and person-hood defined as the essence of ‘generalization.’ We are going to have to work to dig ourselves out of this, but understanding where we are and how we got here is probably an important part of the puzzle.

Conclusion and Final Quote

I strongly disagree with the auto-general-transfer-human-intelligence consensus paradigm. I feel it does not contain the granularity for defining and testing the situation based performance goals and ability and relevant topics and contexts (projects, participation, ethics, system collapse, STEM) that are key parts of integrating AI-ML technologies into the ecosystems of earth and beyond.

I have for years been wrestling with the gaps and idiosyncrasies of this paradigm and have been working to create a better defined framework for testable modular units of intelligence and projects, which is the topic of the larger paper and project that these mini-articles are a part of: Object Relationship Spaces.

Key topics needed:

- multi signal type integration

(images and language concepts is a huge topic)

- project spaces

- externalization

- low bar enlightenment

- general projects

- general system collapse

- generalize STEM

- object relationships

- clarification vs. disinformation

- STEM based ethics

- kasparov event horizons

- human machine interactions

- machine biology integrations

- AI in AR/VR

- cybersecurity (huge topic)

- memory safety

- deployment vs generality (resource use etc)

We Can do better

Here is a list of agenda goals and targets to try to trace out a broader scope for what we should be aiming for with this topic. Like this is not all low-hanging fruit, but if it is important we should keep our eye on it and maintain our perspective.

Goals (Agenda): “We-can” statements:

We can succeed.

We can make things work.

We can understand what is wrong.

We can fix what is broken.

We can use non-automatically lost skills.

We can generalize STEM.

We can generalize system collapse.

We can generalize system fitness and system epidemiology.

We can generalize participation.

We can generalize projects.

We can generalize disinformation vs. definition-clarification.

We can generalize object-relationship-spaces.

We can use STEM to connect signals and reality.

We can connect STEM, project-management, and ethics.

We can use intersecting-interlocking-interconnecting areas.

We can communicate, learn, and solve problems.

We can make progress.

We can use “low-bar enlightenment”:

(Using ‘potentially endless cycles of ~”rebirth” due to ~”ignorance” ‘ as a metaphor/analogy for repeated project-failures, in particular where a lack of perception of the causes of those project-failures is involved in self-perpetuating feedback cycles leading to more such failures.)

We can learn to perceive what can by default be invisible causes of failure and collapse:

Low-Bar Enlightenment:

1. The perception that repeating cycles of failed actions and projects can result from errors in perception and planning (a proverbial ‘wheel of samsara’) without inevitable-automatic-learning based on raw feedback from that failure.

2. The perception that perception can be fooled in principle and in practice.

3. The perception that learning from failures does not happen automatically (and can, under bad circumstances, indefinitely not-happen).

4. The perception that models of causality can be wrong in principle and in practice.

5. The perception that plans/goals can be incorrectly set (so that they fail to be achieved as set).

6. The perception that each participant’s set of the shared definitions of the goals and structure of the project can/will collapse and deform unless maintained and repaired.

Note: This approach is (‘democratically’) broadly accessible to participants requiring minutes to learn rather than myriad lifetimes, does not require all-around perfection of person-ness without context or requiring somehow all contexts, and is not a reification that combines other abilities and insights to solving all the problems in the universe; “low-bar enlightenment” is one humble step toward apprehending the nature of problems and systems.

Intersecting-Interlocking-Interconnecting Areas:

- Clear & Functional Definitions

- Context

- Generalized STEM

- Generalized Projects (project-context)

- Generalized Participation

- Generalized System Collapse

- Generalized Categories of Types of Systems

- Generalized Ethics, Duty & Responsibility

- Generalized Definition-Clarification vs. Disinformation Attacks

- Generalized Definition Behaviors

- Generalized System-Productivity

  • Generalized System-Productivity

We can use system-fitness-health-status-indicators,

We can use system-defense to prevent collapse.

We can generalize system & definition collapse behaviors.

We can use categories of types of systems.

We can use nonautomatic learning.

We can find and fix errors in perception.

We can organize projects.

We can distinguish short term vs. long term.

We can assign roles.

We can check and verify.

We can have policies on “errors and mistakes.”

We can improve and cultivate perception by perceiving perception.

We can prevent future problems.

We can reverse damage from past problems.

We can learn from the past.

We can collect data.

(We can operationally define ‘policy’ as algorithms for non-collapse based on dynamics of collapse.)

We can audit.

We can publish.

We can act with ethics, empathy and compassion.

We can follow best practice.

We can communicate:

We can communicate across space.

We can communicate across time.

We can communicate across cultures.

We can communicate across generation-gaps & succession gaps.

We can communicate across languages.

We can communicate across types of participants.

We can communicate across roles.

We can communicate across projects.

We can communicate across media of communication.

We can communicate using project coordination tools.

We can understand a spectrum of disinformation and clarification-of-information.

We can implement sustainable solutions.

We can prevent future problems.

We can reverse damage from past problems.

We can learn from the past.

We can collect data.

we can make/generate/cultivate and use/utilize:

We can make and use clear descriptions (vs. liabilities of jargon).

We can make and use decisions and coordinate (voting) frameworks and protocols.

We can make and use clear functional operational definitions.

We can make and use data.

We can make and use policies.

We can make and use mandates.

We can make and use strategies.

We can make and use tactics.

We can make and use tests & evaluations.

We can make and use clear functional and operational definitions that keep their meaning over time.

We can complete/succeed-in/finish projects.

We can meet(/deliver) the needs of the target(/user).

We can make progress.

And this may be related to this invisible background history of post-rome western though in extremely abstract abrahamic monism, where ethics and mechanics and productivity were thought to be (interestingly) rooting in some alternate world, some higher inscrutable ‘dimension’ (interesting theme), but critically disconnected from this entire universe. Western thinking, though often invisible to western people who are immersed in western thinking (just meaning to be straight forward here), is still wedded to the idea that all order exists only in alien-alternate-universes and that no progress, ethics, morals, causes, can or should be rooted in this universe. This, after the evaporation of alien universes, has left people in the west for a productive obsession with the abstract but a catastrophic blindspot for integrating practical things together and connecting them to reality.

For example, ‘Flatland’ is a wonderful, thought-provoking, Animal-Farm like, very approachable, tour through mathematical dimensionality, but even on such a pure-math work of fun thinking, the footprint of the western theological cosmology is visible. There is a clear, simplistic, pejorative/negative identity to reality, and a ‘one answer to everything’ to ‘higher dimensions’ with no priority in integration.

‘reality = low and useless, should be destroyed’ ‘escape to alien higher dimensions = the answer!”

https://www.litcharts.com/lit/flatland/themes/religion-divinity-and-the-unknown

Practical integration of things generally something people have a lot of difficulty with. As another example of east-west thinking, in various eastern traditions (to sidestep the large and controversial topic of ‘religion’ not being used outside the west for what the west uses the term to describe outside outside of it) have a not at all clear topic of ‘non-dualism’ around which much disagreement, but arguably one overall way of looking at non-dualism is that there should be and is a way to integrate into the world we live in ways of doing things that work (vs. destroying this world and escaping to the oasis in a higher dimension).

And this pervades western thinking.

reality = low, evil, to be destroyed

women = low, evil, to be destroyed

the physical world = low, evil, to be destroyed

biology = low, evil, to be destroyed

nature = low, evil, to be destroyed

only a spark a alien divinity in some masculine men = a lost fragment from a good alternate alien universe

This kind of violently simplistic thinking also gives a bad name to various non-mundane pursuits that are interested in integrating simple and non-simple things.

And somewhat fascinatingly, Deep Learning AI-ML brings up these very same types of questions: manifold hypothesis, the nature of patterns, the relationship between higher dimensional spaces and lower, the nature of meaning, etc.

Assuming that I am not wrong in my agenda to NOT destroy the universe and pragmatism itself, it is very important that we coherently integrate all these topics in pragmatic ways.

As another side note on ethics and morality: the drop off in the foundation of looking at destructive behavior in the west I suppose not surprisingly lead to a complete loss of the conception of destructive bad behavior. Read the opening of “Thinking Fast and Slow” by Daniel Khanamn (and then deceased Amos Tversky, the book is about the work they did together but written after Tversky died), where Dr. Kahnamn clearly describes the overwhelmingly dominant paradigms of the fields he worked in from the 1970s: it was simply assumed that people never made mistakes, with the background assumption that they were always trying to be as productive as possible. Their work focused on the decision making process, but since the rise of the internet we have also seen how an even broader pattern of violent, bullying, trolling, radicalizing, disinformation-using, destruction is irresistible to people. It is at least to me surprising that 1970 was only 25 years after 1945, yet the tendency for people to flock to violent extremism of various flavors seemed to have been somehow erased from official possibility.

And yet sober and eloquent leaders in the AI field such as Michael Wooldridge (I highly recommend his wonderful book, a brief history of AI), clearly and flatly says what seems obvious (and is generally implied if not stated elsewhere): H.sapiens humans are the model for AI; the goal is to build AI to do what H.sapiens-humans do.

If that is true, then AI will have profound decision-making problems, sabotage any project it gets involved with, and be irresistibly attracted to violent extremism such as the right or left ‘revolutionary spirit!’ of the heartbreaking destruction of WWII. (I also recommend William L. Shierer, berlin dailies, the rise and fall, Timothy Snider, black earth, ?Anne Applebalm red harvest, and as a follow up: we are all targets, and ‘there is nothing for you hear’ and of course Tony Judt’ Post-war; hopefully those will be a start for looking at he 1900’s, if only in a superficial ‘I read a book about it’ way. ).

I may be overly obsessed in my simplistic agenda to be practical, project oriented, productive, sustainable, far-sighted, cautious of hidden problems in perceptions and frameworks, etc., but so far in western history the limiting factor has not been too much integration of STEM, Ethics, and productivity. Maybe we will survive to a point where we can back off and prioritize some other things we are overlooking. But at this time 2023 these goals (paper) seem fair.

Or if you feel I am over-stating my criticism and it seems obvious to you (and or obvious to you that it is obvious to everyone) that there is generalized STEM, project, participation etc., and that STEM ethics morals and low bar enlightenment are all operationally compatible in a context of resilience against disinformation, then I would love to see your proposed plan for how people can implement that in routine projects and AI architecture, because I’m pretty sure my proposals are not the best.

A Favorite Quote

I would like to say again that I do not blame Francois Chollet for the paradigm, and that I think his very well thought through coverage of the topic is his book is a valuable resource for understanding the topic.

I would like to end by quoting the last few lines of Francois Chollet’s Deep Learning with Python 2nd Edition, which makes me literally tear up.

“Learning is a lifelong journey, especially in the field of AI, where we have far more unknowns on our hands than certitudes. So please go on learning, questionsing, and researching. Never stop! Because even given the progress made so far, most of the fundamental questions in AI remain unanswered. Many haven’t even been properly asked yet.”

See

https://en.wikipedia.org/wiki/Artificial_general_intelligence

https://en.wikipedia.org/wiki/Clarke%27s_three_laws

https://en.wikipedia.org/wiki/Three_Laws_of_Robotics

https://en.wikipedia.org/wiki/Isaac_Asimov

https://en.wikipedia.org/wiki/Arthur_C._Clarke

https://en.wikipedia.org/wiki/Falsifiability

https://www.britannica.com/topic/criterion-of-falsifiability

https://www.britannica.com/topic/law-of-nature

https://en.wikipedia.org/wiki/Hypothetico-deductive_model

https://en.wikipedia.org/wiki/Statistical_hypothesis_testing#Definition_of_terms

https://en.wikipedia.org/wiki/Falsifiability

https://www.techtarget.com/whatis/definition/falsifiability

https://en.wikipedia.org/wiki/Perceptrons_(book)

https://en.wikipedia.org/wiki/AlexNet

https://en.wikipedia.org/wiki/Dartmouth_workshop

https://en.wikipedia.org/wiki/Frank_Rosenblatt

https://en.wikipedia.org/wiki/John_von_Neumann

https://en.wikipedia.org/wiki/Alan_Turing

https://en.wikipedia.org/wiki/ImageNet#ImageNet_Challenge

https://en.wikipedia.org/wiki/Ronald_Fisher

https://en.wikipedia.org/wiki/Geoffrey_Hinton

https://en.wikipedia.org/wiki/OpenAI

https://en.wikipedia.org/wiki/ChatGPT

https://en.wikipedia.org/wiki/Claude_Shannon

https://en.wikipedia.org/wiki/Sally%E2%80%93Anne_test

https://en.wikipedia.org/wiki/Winograd_schema_challenge

https://www.amazon.com/Complexity-Guided-Tour-Melanie-Mitchell/dp/0199798109/

https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow

https://platform.openai.com/docs/models/gpt-4

https://openai.com/blog/introducing-text-and-code-embeddings

https://arxiv.org/abs/2005.14165

https://github.com/openai/gpt-3

https://www.wired.com/story/how-chatgpt-works-large-language-model/

https://huggingface.co/docs/transformers/tokenizer_summary

https://ieeexplore.ieee.org/abstract/document/8684304

https://goertzel.org/who-coined-the-term-agi/

https://medium.com/@melaniemitchell.me/can-gpt-3-make-analogies-16436605c446

https://dspace.mit.edu/handle/1721.1/5648

https://www.amazon.com/G%C3%B6del-Escher-Bach-Eternal-Golden/dp/0465026567/

https://www.amazon.com/Learning-Python-Second-Fran%C3%A7ois-Chollet/dp/1617296864/

https://en.wikipedia.org/wiki/Cyc

https://en.wiktionary.org/wiki/subsymbolic

https://courses.media.mit.edu/2016spring/mass63/wp-content/uploads/sites/40/2016/02/Symbolic-vs.-Subsymbolic.pptx_.pdf

https://en.wikipedia.org/wiki/Physical_symbol_system

https://www.amazon.com/Natural-Language-Processing-Action-Understanding-ebook/dp/B097826WLF

https://www.litcharts.com/lit/flatland/themes/religion-divinity-and-the-unknown

https://en.wikipedia.org/wiki/AI_effect

https://melaniemitchell.me/PapersContent/BurnellEtAlScience2023.pdf

https://melaniemitchell.me/

https://en.wikipedia.org/wiki/ELIZA_effect

https://en.wikipedia.org/wiki/Language_model#Benchmarks

https://gluebenchmark.com/

https://super.gluebenchmark.com/

https://www.britannica.com/topic/law-of-nature

https://www.amazon.com/Invisible-Women-Data-World-Designed/dp/1419729071

About The Series

This mini-article is part of a series to support clear discussions about Artificial Intelligence (AI-ML). A more in-depth discussion and framework proposal is available in this github repo:

https://github.com/lineality/object_relationship_spaces_ai_ml

--

--