Intelligence and Life: The Minimal Transformer Model

katoshi
Neo-Cybernetics
Published in
8 min readApr 14, 2024
Photo by Ajay Karpur on Unsplash

It has been some time since conversational AI and generative AI became hot topics. The race to develop generative AI is intensifying among various companies, but as of now, OpenAI’s ChatGPT4 seems to maintain a leading position.

The last “T” in ChatGPT stands for “Transformer,” a technology key to current conversational AI.

The transformer leverages neural networks, a technology long known in the field of artificial intelligence. The number of neurons included has increased significantly, enabling conversations closer to human-like interaction, and it is known that increasing this number could enhance intellectual capabilities up to a certain point.

However, what is crucial about transformers is the structure of the processing using these neural networks, which includes structures known as encoders and decoders.

In this article, rather than focusing on the neural networks and their size, we will consider the properties of the transformer’s processing structure through a very simple minimal transformer model. Essentially, it can be considered as a transformation process by a single neuron.

This minimal transformer model, despite its simplicity, shows potential for surprisingly complex processing, which could be deemed the essence of the transformer model. By replacing this single neuron with a neural network containing multiple neurons, we can understand that actual conversational AI is capable of robust processing.

Furthermore, focusing on the structure of the transformer reveals that it shares a similar structure with single-celled organisms, not just intelligence. While the basic mechanisms of neurons and biological cellular responses differ significantly, they share a common processing structure, indicating commonality in processing structures between life and intelligence.

The Wave of a Single Neuron

We consider a feedback structure where the output signal of a single neuron with a continuous activation function is fed back into its input.

The input signal is transformed into the output signal. Here, a constant weight is applied to the input value, a bias is added, and it is transformed by a function known as an activation function, producing the value of the output signal.

Due to the feedback loop structure, the output signal is again provided as an input signal, where the weight is applied and the bias added, and it is transformed again by a continuous activation function.

Depending on the initial values, weights, biases, and activation functions, this mechanism may converge to a certain value, repeat with a certain periodicity, or produce complex waves without convergence or simple periodicity.

At the output stage, if the output is fed back as binary values between 0 and 1, or, for example, as 8-bit digital values ranging from 0 to 256, it would result in monotonous repetition or convergence to a constant value. On the other hand, if feedback is conducted with continuous real numbers, while the extracted wave as output is converted into binary or digital values, it could produce a complex sequence of digital values.

State Determined by a Single Parameter

In the case where the neuron produces complex waves, the initial value of the input signal determines the waves produced thereafter.

Moreover, fixing the initial feedback value at 0 and adding a specific value to the bias can change the wave pattern of subsequent outputs. When a specific value is added to the bias, it not only affects the bias in the first round but continues to influence the bias in the second round and beyond when the output signal is fed back. Depending on the value, the wave may converge, become monotonously periodic, or form complex waves.

Setting a specific value added to the bias as the initial state means that the wave output of a single neuron can be varied in many ways by a single numerical parameter.

Learning of a Single Neuron

We consider providing numerous pairs of a single parameter as the initial value and a sequence of expected digital values as correct data to this simple neuron.

Since it is only a single neuron, whether it can learn to produce outputs as per the correct data greatly depends on the characteristics of the correct data. However, if not only the weight and bias but also the activation function is parameterized to fit the correct data well, the likelihood of fitting the correct data well increases.

Furthermore, the size of the single parameter value as the initial value is not very significant. If it only needs to be distinguishable as a different value, then processing to map the single parameter value to a different value can make it easier to fit the learning data.

Such adjustments can match the characteristics of the correct data within the range that a single neuron can represent, allowing learning to be established and converged.

If the correct data fits well and learning progresses smoothly, the sequence of digital values produced by this single neuron when a single parameter is provided will be close to the correct data.

Transformation of a Digital Value Sequence into a Single Parameter

Earlier, we discussed remapping the single parameter to fit the learning of a single neuron. If this idea is considered as an adjustment of the method of converting a digital value sequence into a single parameter, learning can proceed as before.

Another single neuron can be used to determine the conversion method from a digital value sequence to a single parameter in a machine-learning manner. This neuron can be formulated to receive each value in the input digital value sequence as separate input signals, applying different weights and biases to them, and possessing a function to normalize them to a certain range of values.

This mechanism consisting of two neurons with different roles is very simple, so whether it can learn successfully depends on whether the characteristics of the correct data fit. However, if the correct data fits to some extent, after learning, the two neurons can produce a sequence of digital values close to the correct data when a digital value sequence is given.

Significance of the Digital Value Sequence

This mechanism implies that a simple model composed of two neurons with different roles functions as a system that takes a sequence of digital values as input and outputs another sequence of digital values.

Here, consider the individual numbers in the digital value sequence as tokens. And suppose that each token is linked to a word, character, or symbol. Then, this simple model becomes a mechanism that, when given an input string, generates an output string.

The Minimal Transformer Model

Thus, the model described so far is very simple and limited but serves as a language model, which could be called a micro language model.

If this model is viewed as a micro language model, the single neuron with a feedback loop structure described first acts as a decoder, the single parameter determining the decoder’s behavior acts as an encode, and the second neuron introduced for transformation corresponds to an encoder, forming a very simple transformer. If seen as a capability to convert binary numeric values, this model is not limited to language processing applications but can be used for general purposes.

Simplification to a Single Neuron

It is also worth noting that these two neurons can be simplified further into a single neuron. That is because the inputs and parameters of the neuron generating the single parameter, as shown in the second example, can directly serve as the inputs and parameters of the first neuron with a feedback loop structure, allowing the same calculations to be performed.

While it is easier to understand by considering the two neurons as described earlier for a structure such as a transformer with encoders and decoders, the equivalent function can be realized by a single neuron.

This means that the actual minimal structure of a language model or a general-purpose transformer model could be a single neuron with a feedback loop.

The Cellular Model as a Transformer

Organisms, including single-celled ones, respond complexly to inputs. This is true even for unicellular organisms.

Various patterns of stimuli provided from the external environment as inputs, and the adaptive responses of the cells to these inputs, might be modeled in the form of a transformer.

Single-celled organisms are simpler than multicellular organisms but are capable of complex processes necessary for sustaining life on their own. Simple responses to simple stimuli would not be sufficient for this purpose.

The fact that simple single-celled organisms can achieve such complex processing is quite mysterious, but this mystery becomes slightly clearer when viewed through the transformer model.

As mentioned earlier, this minimal transformer model, despite being a single neuron, has shown the potential to realize language generation within a very limited range of conditions.

This suggests that a single neuron, a simple entity that we intuitively think can only perform limited functions, has the potential to achieve far more.

If this perspective is applied to the responses of single cells, it might explain how even simple organisms like single cells are capable of precise responses necessary for sustaining life under diverse environmental conditions.

Of course, a single cell is likely to have a more complex structure than a minimal transformer due to various cellular organelles and biomolecules present. These could allow it to learn and adapt to more complex situations and responses than the minimal transformer.

Internal Feedback Loop

When considering unicellular organisms, it might be hard to imagine a structure where the output is fed back as input. This is especially true if one imagines the cell reacting externally.

However, if it is considered that the output occurs within the cell and this output serves as input for the next reaction, it becomes clear that a feedback loop structure can be seamlessly realized even in unicellular organisms.

This might involve the process of protein synthesis mediated by DNA to RNA, or chemical reactions caused by proteins already present within the cell.

These protein synthesis and chemical reactions induce changes within the cell, utilizing these changes to trigger the next reaction.

This internal feedback loop, linking one reaction to the next, forms a structure capable of realizing highly complex processing in response to external stimuli.

Cellular Learning

Thus, if the initial input is considered as stimuli from outside the cell, and the output as a time-series of internal reactions, with this being re-inputted internally as a feedback loop, it becomes clear that the cell possesses a transformer-like structure.

The appropriate processing must have been controlled through a long accumulation of evolution, essentially learning over time. This record of learning is likely inscribed in the genetic information of unicellular organisms’ DNA.

While the detailed processing model of a cell might differ from the network structure modeled by neurons with weights, biases, and activation functions, even a simple model like the minimal transformer can perform more complex tasks than our intuition suggests, not just because of the complexity of the internal computational formulas but also because of its structure.

The structure that generates time-series output signals accompanied by feedback loops in response to simultaneously occurring multiple input signals is an important aspect of the transformer. By possessing this structure, it can realize far more complex patterns of input and output combinations than the apparent calculations inside.

And if it can learn and establish the combination of simultaneous input and time-series output, it can realize complex functions like life and intelligence.

Conclusion

In this article, we have analyzed transformer technology used in systems like ChatGPT, focusing not on the complexity of the internal neural network but on its processing structure. By focusing solely on this structure, it becomes understandable how artificial intelligence, human intelligence, and cellular organisms possess the same structures, enabling learning and adaptive processing.

This includes a transformation structure that simultaneously processes external inputs into certain internal parameters, and a feedback loop structure that can take these internal parameters and its own output as input signals, sequentially generating output signals. By using this entire structure to adapt to the environment and learning data, learning progresses, and by storing this learned content, intelligent processing capable of adapting to various situations, including unknown ones, can be realized.

--

--

katoshi
Neo-Cybernetics

Software Engineer and System Architect with a Ph.D. I write articles exploring the common nature between life and intelligence from a system perspective.