Part 4 — A Mathematical Framework for Fluid Intelligence
In the previous parts, I discussed why large language models alone fall short of achieving AGI and why I believe a paradigm shift is necessary, moving toward mathematical reasoning as the core foundation for AI. I argued that centering on language-based reasoning constrains models to probabilistic associations, limiting their capacity for rigorous, logical inference. Instead, I suggested that language should emerge as a secondary function from a mathematically grounded foundation, making it more precise, powerful, and effective for handling complex reasoning tasks. This shift from language-first to math-first AI isn’t just theoretical; it has profound implications for building models that can genuinely interpret, learn, and express complex concepts.
In this blog, I dive into the specifics of what this foundational shift may look like through a mathematical framework or a scaffolding for such a model. I explore the concept of a governing function that would allow such models to learn new, unseen actions with minimal data input. This function, inspired by multi-modal state inputs and rooted in mathematical principles, could enable AGI to develop emergent language grounded in logic, resulting in deeper reasoning, fewer hallucinations, and greater coherence.
Note: This article is not a claim that I know what AGI is, nor that the approaches I work on are necessarily the right path. Treat this with the same skepticism and scrutiny required when adopting any framework. Most importantly, engage with me so we can develop a shared understanding.
This is a multipart series that is evolving:
- Part 1 — Why LLMs Will Never Lead to AGI
- Part 2 — Beyond Language: Why Scaling LLMs Won’t Lead to AGI
- Part 3 — Rethinking Cognition and AGI from a Mathematics First Principle
- Part 4 — A Mathematical Framework for Fluid Intelligence
A High Level Framework for AGI
I was thinking of a mathematical way to formalize what AGI is, as there is no single definition that captures its essence. First, what is my definition of AGI?
In simple terms, for me, AGI is the ability of a model to learn a new, unseen action with a minimal set of observations, integrating this novel action seamlessly into its internal world knowledge and doing so in the fastest, most energy-efficient way possible.
So this entails:
- Learning new action
- In an unseen setting
- With minimal set of observations of new data
- In real-time, costing less energy
This process probably implies learning a new governing function for an unseen action, possibly within an entirely new spatial domain. Such a perspective demands an AGI capable of abstracting knowledge from previous actions, leveraging Bayesian priors, and efficiently extrapolating these insights to unknown scenarios.
Now, why am I framing AGI in this form? For me there is a distinction between crystallized intelligence and fluid intelligence. Fluid intelligence and crystallized intelligence represent two distinct forms of cognitive ability.
Fluid intelligence is the capacity to think logically and solve novel problems independently of acquired knowledge; it involves abstract reasoning, pattern recognition, and adaptability in unfamiliar situations.
In contrast, crystallized intelligence relies on accumulated knowledge and experience, drawing from learned skills, facts, and cultural knowledge. Crystallized intelligence neither can perform learnt function on OOD data (out of domain) nor can it learn new actions. (LLMs are crystallized intelligence)
In humans (bio organisms), fluid intelligence tends to peak in early adulthood and gradually declines with age, while crystallized intelligence often improves or remains stable throughout life, as it builds over time. These differences highlight how fluid intelligence is more dynamic and flexible, whereas crystallized intelligence is rooted in memory and learned expertise.
Note that both are important.
Hence, AGI models should demonstrate fluid intelligence.
To begin, consider each action
which is governed by a unique function
where (x,y,z) represent spatial coordinates in a bounded region R⊂R³, t denotes time, and s is an additional n-dimensional state vector representing a high-dimensional, multi-modal input (think of this as a n-dimensional tensor). This state vector s could encompass sensory information from actuators, contextual embeddings, and/or even latent variables that define the current state of the AGI model’s environment and its internal states. Each action A_k may thus involve a completely different governing function due to the variability in both the world map coordinates (x,y,z) and the high-dimensional input s.
Formally, the spatial domain R is defined as
Each function
maps the combination of spatial-temporal coordinates (x,y,z,t) and the state vector s to an output that defines the dynamics of action A_k within the spatial region R (I am not yet convinced if we need to move from Real to Complex domain). The challenge in constructing AGI lies in its ability to learn a new governing function
in a way that is conditioned on previously learned functions
without exhaustive new data.
Our objective, therefore, is to learn f_An+1(x,y,z,t,s) instantaneously, leveraging Bayesian priors from the previous n actions (if possible). To achieve this, we can possibly frame the learning process as an adaptive Bayesian inference problem, where each f_Ak acts as a probabilistic prior, enabling the system to infer fAn+1 with minimal new observations.
The posterior distribution for the governing function of the new action An+1 can be expressed as
where,
denotes the likelihood derived from the minimal new data specific to An+1, while P(fA1,fA2,…,fAn) captures the cumulative prior knowledge. The goal of the model is to maximize this posterior distribution, effectively “bootstrapping” the learning of fAn+1 based on prior distributions.
In this framework, each governing function f_Ak may vary significantly depending on the action A_k, the spatial coordinates (x,y,z), and the n-dimensional state vector s. This adds a level of complexity to the AGI model’s learning process, as it must recognize and adapt to fundamentally different governing dynamics across actions. The mathematical formulation captures this variability by defining a multi-integral across both spatial-temporal domains and the state vector space, allowing for a comprehensive integration of information.
To encode the model’s adaptation across these diverse governing functions, consider the integral over both the spatial-temporal region and the state space:
where ds represents the differential element for the state vector space, and dV={dx, dy, dz} is the differential volume element for spatial coordinates. This multi-dimensional integral captures the entirety of the governing function fAn+1 over the complex domain defined by (x,y,z,t,s), leveraging prior knowledge while adapting to novel observations.
Hypothetical Governing Function for Cognitive Dynamics
As a hypothesis, a governing function for the AGI model should probably closely emulate human cognition; herein lies the rub, and the devil is in the details*. So it would likely require a multi-scale, hybrid approach that integrates principles of dynamical systems, network theory, stochastic processes, and Bayesian inference. This could be conceptualized by a hybrid governing function F(w,t,s) that operates over multiple levels of neural and cognitive processing :
This equation integrates:
- wi represents the weight of the connection involving neuron i, such as a synaptic strength or connection weight, and w_j similarly represents the weight for neuron j.
- ∂wi/∂t=F(wi,wj,s,t) + ξi models the dynamics of weight changes over time, where the weight update w_i depends on both w_j (interactions with other neurons) and other factors encoded in s and t.
- Stochastic noise ξi, modeling the inherent variability in neural responses,
- Network interactions across model regions through coupling terms Gij(xj−xi), which capture synchrony, communication, and cooperative processing among distributed neural populations of the model.
- ∑i,j Gij(wj−wi) represents the coupling or interaction term between weights, where Gij could model the influence of the difference in weights wj−wi on weight updates, capturing synaptic cooperation or competition dynamics.
The governing function F(w,t,s) is, therefore, a complex, multi-dimensional, multi-modal construct that captures the model’s ability to adapt continuously to new inputs, integrate information across sensory modalities, and learn through plasticity and connectivity dynamics.
Why do we have a connection term Gij and also wi and wj?
You’re right in wondering the need for multiple weights . This is a common point of confusion in designing cognitive models. The connection or synaptic weight between neurons does often capture the relationship between two neurons. However, in more advanced and nuanced models of neural dynamics, using both neuron-specific states (wi and wj) and connection weights Gij together allows us to model a wider range of behaviors and interactions. Here’s why both are useful:
- Neuron-Specific Dynamics: Each neuron or cognitive unit has its own dynamic state, represented here by wi. This could reflect individual properties like firing threshold, activation level, or other neuron-intrinsic characteristics that change over time, independently of other neurons. Neurons don’t just respond to incoming connections, they also have intrinsic properties that can evolve (e.g., due to adaptation or fatigue). So, wi can capture these neuron-specific dynamics, while Gij models the strength of their interaction.
- Dynamic Interactions vs. Static Connections: In traditional neural networks, a static connection weight Gij between two neurons can represent the strength of influence, but it’s typically fixed (or only updated during learning). In this dynamic model, both the intrinsic state wi and the interaction term Gij(wj−wi) can change over time, allowing for a richer, context-dependent interaction. This flexibility is important for representing real-time neural interactions where connectivity can be modulated based on current neural states or external factors.
- Emergent Properties through State-Dependent Interactions: Cognitive processes often involve emergent properties that arise from complex interactions between neuron states, not just their connections. By having both wi and Gij, we can model interactions that depend on the states of the neurons themselves. For instance, a neuron might only influence another if its activation wi surpasses a certain level, or it may exhibit a different influence pattern depending on the overall network state. (this is different from activation functions alone).
- Modeling Hebbian Learning and Plasticity: If you want to simulate concepts like Hebbian learning or synaptic plasticity (where connections strengthen or weaken based on activity), using both neuron states and connection weights is useful. The model could adjust Gij based on the difference (wj−wi) over time, reflecting learning through repeated interactions. This enables the model to go beyond a static weight and incorporate adaptive behavior in the connections themselves.
In the AGI framework, this function provides a theoretical blueprint for designing a system that can generalize, adapt, and refine its own governing functions across varied domains and inputs — much like the human brain. The complex mathematical structure of F(w,t,s) reflects the probable architecture required for the AGI to navigate and learn from highly diverse environments with minimal data, continually updating and optimizing its probabilistic representations.
Optimization for Minimal Data Dependence in Learning
The optimization problem is then to minimize the dependency on new observations for the (n+1)-th action, while ensuring the model can accurately approximate fAn+1 within a confidence interval. Formally, we can frame this optimization as
where I denotes the indicator function that verifies whether fAn+1 exceeds a defined threshold ϵ, ensuring that only regions of high information gain are incorporated. This condition ensures that the model focuses its learning effort on regions of high uncertainty, minimizing redundant data acquisition.
To generalize this approach across all previous actions, the model applies a recursive Bayesian update through the prior distributions, integrating each action’s governing function f_Ak iteratively:
where d{A} represents the measure over the sequence of prior actions, enabling the model to encode information from each f_Ak as a probabilistic scaffold. This multi-integral structure allows the AGI system to assimilate data from vastly different governing functions, all defined within the intricate, multi-dimensional space of (x,y,z,t,s), probably making it highly adaptable.
Each new observation updates the posterior distribution for fAn+1, conditioning it on both the spatial-temporal domain and the high-dimensional state vector. This results in an advanced Bayesian updating mechanism where the posterior distribution for each new action is recursively built upon the learned structure from prior actions:
This formulation synthesizes prior actions and new observations into a coherent probabilistic framework, enabling the model to infer and learn fAn+1 rapidly and accurately.
By iteratively integrating prior knowledge across varied actions, the AGI model achieves a remarkable balance between generalization and specificity. Each governing function f_Ak can be fundamentally different due to dependencies on unique spatial, temporal, and state vector conditions. Yet, through the multi-integral recursive Bayesian inference, the model continuously refines its posterior beliefs, generalizing seamlessly to new, unseen actions.
This approach equips AGI models with a fluid ability to map complex dynamics across high-dimensional, multi-modal spaces instantaneously and with minimal data by capturing the essence of human-like adaptability in an algorithmic structure.
Stay tuned.
Disclaimer
Freedom Preetham is an AI Researcher with background in math and quantum physics and working on genomics in particular. This paper is original work of Freedom. You are free to use and expand on this research idea as applicable to other domains. Attribution to Freedom Preetham with a link to this paper on medium is welcome if you find it useful.