Simulation-Based Inference: Generative AI Technology from the Perspective of a Software Engineer
The emergence of conversational AI and generative AI technologies has garnered significant attention because they enable capabilities not possible with traditional software and IT systems. These technologies have evolved from a different research perspective than conventional software technology, which can make them challenging for engineers like you, who are experienced in traditional software development, to grasp fully.
One common issue is that technical articles often skip the overview that software engineers seek and delve directly into the intricacies of AI technology, making it harder to understand. Additionally, while many explanations focus on the learning processes of AI, software engineers like you might be more interested in understanding the architecture of the platforms on which these trained models operate.
Let’s clarify the operational principles of the transformer model from a software engineer’s perspective. The transformer model, which is also used in OpenAI’s ChatGPT and corresponds to the ‘T’ in GPT, has distinctive features that set it apart from traditional AI. One of these is the ability to perform simulation-based inferences internally, in addition to pattern recognition capabilities. This aligns AI technology more closely with human abilities, as humans also engage in pattern recognition and simulation-based inference.
The transformer model consists of encoder and decoder parts. The input is processed by the encoder, and the decoder generates the output. Diagrams illustrating this structure are readily available online, so they are not included in this article.
Let’s delve into the details to gain a deeper understanding.
Vector Converters and Vector Processors
A vector is an array of multiple numerical values, with a defined length.
The trained encoder of a transformer model transforms an input vector and generates an intermediate vector called an encoding. This generated intermediate vector is used as input to the trained decoder.
The trained decoder uses the intermediate vector, internal vectors, and outputted vectors to produce numerical values and update the internal vectors. The produced numerical values are added to the end of the output vector.
Then, the trained decoder repeats this process until a termination condition is met. The termination condition is either producing a value that indicates the end or reaching the maximum size of the output vector.
Viewed in this way, the trained encoder of the transformer model can be considered a vector converter, and the trained decoder can be called a vector processor.
Implementation of Vector Converters and Vector Processors
Generally, software is implemented within the scope of what can be expressed in a programming language. In other words, when expressing the implementation of some functionality in software, it does not imply implementation up to the hardware or programming language.
At this time, the hardware and programming language are referred to as the platform. The software is implemented on this platform without changing the platform.
Similarly, when expressing the implementation of vector converters and vector processors here, it means implementation on a certain platform. It means implementing within the range that can be expressed on the platform, without making changes to that platform.
The platform for implementing vector converters and vector processors becomes the basic architecture part of the transformer. The basic architecture part of the transformer includes neural networks with unique structures.
The combination of numerical values that the neural network’s parameter vectors can take represents the range of what can be expressed by the transformer’s basic architecture. Determining these parameter vectors corresponds to the implementation of the vector converters and vector processors.
Normally, the determination of the parameter vectors is done through the process of machine learning. This involves feeding in training data as input and adjusting the parameters if the expected output is not produced, which is the basic process of machine learning. Through this, the parameter vectors are determined.
A Simple Example
Let’s consider a simple example to understand vector converters and vector processors.
First, to make it extremely simple, let’s consider the input and output vectors as arrays of two numbers. Additionally, let’s also consider the intermediate vector as an array of two numbers.
The numbers in the input and output vectors will be simply 0 and 1.
Therefore, there are only four patterns of input. The input and corresponding expected outputs are as follows:
Input: 00, Expected Output: 11
Input: 01, Expected Output: 10
Input: 10, Expected Output: 01
Input: 11, Expected Output: 00
When trained, the method of transforming the input vector into the intermediate vector by the vector converter, and the part of deriving the output from the intermediate vector by the vector processor, can have multiple combination patterns.
For example, one case could be where the vector converter transforms the input into the same intermediate vector as the expected output. In this case, the vector processor simply needs to output the intermediate vector in order.
Specifically, the vector converter would be implemented to perform the following transformations:
Input: 00, Intermediate Vector: 11
Input: 01, Intermediate Vector: 10
Input: 10, Intermediate Vector: 01
Input: 11, Intermediate Vector: 00
In this case, let’s say the vector processor holds two numbers as its internal vector, with an initial value of 10.
The vector processor multiplies the first digit of the intermediate vector with the first digit of the internal vector, and the second digits are also multiplied. Then, it outputs their sum.
Also, the internal vector is updated from the initial value of 10 to 01, shifting the 1 one digit to the right.
For example, if the input vector 01 is given, the intermediate vector becomes 10.
At the first operation of the vector processor, the internal vector is 10. Multiplying the first digits of the intermediate vector 10 and internal vector 10 results in 1, and multiplying the second digits results in 0, leading to an output of 1. At this time, the internal vector becomes 01, following the earlier rule.
At the second operation of the vector processor, the internal vector is 01. Both the first and second digits of the intermediate vector 10 and the internal vector 01 result in 0 when multiplied, so the output is 0.
Thus, this vector converter and vector processor can output 10 for the input 01. Similarly, it can output as expected for the inputs mentioned above.
Another Pattern
Another clear implementation pattern for the vector converter and vector processor for the same input and expected output is a pattern where the vector converter transforms the input vector into an identical intermediate vector.
Specifically, the vector converter would be implemented to perform the following transformations:
Input: 00, Intermediate Vector: 00
Input: 01, Intermediate Vector: 01
Input: 10, Intermediate Vector: 10
Input: 11, Intermediate Vector: 11
In this case, there are several possible implementation methods for the vector processor.
For example, we can set the initial value of the outputted vector to all -1. Then, the output is determined according to the following rules:
If the first digit of the outputted vector is -1 and the first digit of the intermediate vector is 0, output 1.
If the first digit of the outputted vector is -1 and the first digit of the intermediate vector is 1, output 0.
If the first digit of the outputted vector is not -1, the second digit is -1, and the second digit of the intermediate vector is 0, output 1.
If the first digit of the outputted vector is not -1, the second digit is -1, and the second digit of the intermediate vector is 1, output 0.
This vector converter and vector processor also output values as expected for the input.
A Slightly More Complex Example
The previous example was simple because the vector was an array of two numbers, and the only possible numbers were 0 and 1. Furthermore, it was very simple because the expected output for an input was deterministically and uniquely determined.
In addition to these deterministic pairs of input and expected output, this mechanism can also handle non-deterministic combinations. An example of this is as follows:
Input: 00, Expected Output: 11 (70%), 10 (20%), 01 (10%)
Input: 01, Expected Output: 10 (50%), 01 (50%)
Input: 11, Expected Output: 00 (100%)
This differs from the previous example in two ways, although the size of the input and expected output vectors and the contents of the numbers remain unchanged.
The first difference is that there are probabilistically multiple expected outputs for the same input. In the sample above, when the input is 00, it is expected that 11 will be output 70% of the time, 10 20% of the time, and 01 10% of the time. This can be understood by thinking about responses in a conversation. For example, the response to “Hello” in a conversation is not always “Hello.”
The second difference is that not all input patterns are covered. This is easy to understand when you think about training a conversation. For example, a word like “aaaa” is not typical, so it would not be included as training data.
Also, even if you train a conversation with “Hello” or “Good evening,” when you provide the trained AI to users, they might use word patterns not included in the training data, like “Hiya” or “Good evening, sir.” It is impossible to pre-train the AI with all the conversation patterns humans use.
Characteristics of Vector Converters and Vector Processors
Considering the simple examples above, several characteristics of this mechanism become apparent.
Firstly, an easy and understandable characteristic is its ability to produce probabilistic outputs in response to inputs, and its capacity to produce some output for input patterns not included in the training data.
More interestingly, it has the ability to accommodate both pattern recognition and simulation-like outputs. In the first simple example, where the intermediate vector is transformed to match the expected output, the vector processor was using an internal counter, adjusting the output based on the value of this counter. This is akin to the operations of a typical computer program. The ability to update an internal state like a counter and proceed with iterative processing implies the capability for simulation.
On the other hand, in the second simple example, where the intermediate vector was transformed to match the input vector, the output was determined based on the patterns of the intermediate and outputted vectors. This indicates that the vector processor can perform pattern recognition operations.
In these simple examples, we were able to implement vector converters and vector processors that could appropriately produce the expected output, whether through simulation-like or pattern recognition processes.
However, with more complex scenarios, there might be situations where simulation-like processing is more appropriate, where pattern recognition is more suitable, or where a hybrid approach is required.
The platform underlying the vector converters and vector processors has an architecture capable of handling such complex processes. The challenge then becomes whether we can successfully implement vector converters and vector processors.
Advanced Reasoning Capabilities
The structure of the neural network in transformers, as well as the attention mechanisms, are, I believe, successful in this regard.
By appropriately utilizing and combining simulation-like and pattern recognition processes, conversational AI agents powered by transformers are capable of not only engaging in pattern recognition conversations that simply respond with known knowledge but also performing multi-step reasoning and organizing responses in a sequential manner.
Such responses are impossible with pattern recognition alone and require internal simulation-like processes. Simulation requires understanding a conceptual state model and the rules applied to it, as well as the ability to repeatedly apply rules to the state model to effect state changes. As we saw earlier, the iterative processing in the vector processor part possesses these capabilities.
The next step involves correctly identifying from the input sentence the parts that require such simulation, organizing the necessary concepts and rules into the intermediate vector, and having a vector processor that has already learned general conceptual models and rule sets. By combining these with the intermediate vector, it should be feasible to execute the simulation.
Simulation and Attention
It is believed that the attention mechanism plays a significant role in these simulation-like processes.
When the vector processor performs simulation-like processes, it must determine which values in the intermediate vector, outputted vector, and internal vector should be more significantly reflected in the output or the internal vector, and this emphasis can change over time.
This can be well understood by considering something like the internal vector counter mentioned in the simple example. In the simple example, the first digit of the intermediate vector was directly reflected in the output in the first processing step, and the second digit was directly reflected in the second processing step. This indicates that the focus within the vector changed between the first and second processing steps.
Thus, the attention mechanism is thought to be responsible for identifying and transitioning the focus to relevant areas according to the situation. This allows for the execution of simulation processes that evolve the state of the conceptual model as the processing progresses.
Inference Ability
We engage in our daily activities by making inferences about the future or unknown based on information we have seen or heard.
For example, if we wake up in the morning and see a family member looking unwell, we might suspect they are ill without being told. When pouring milk into a cup, we estimate at what point to stop tilting the bottle to pour the desired amount. While walking on the street, we infer the next movements of nearby people or cars to decide whether to proceed or stop.
This involves creating a conceptual model in our mind according to the situation and inferring unknown aspects or future developments from the state of this conceptual model.
We infer that someone with a pale complexion is likely unwell, which is an inference based on pattern recognition. When pouring milk, we imagine a conceptual model in our mind, such as the amount of milk in the cup and the tilt of the milk bottle, and apply the rule that more milk flows into the cup quickly if the bottle is tilted more. This real-time simulation in our minds helps us estimate when to stop tilting the bottle to achieve the desired milk level in the cup.
When walking, we model the positions of people and cars around us and calculate their speed in the direction of travel. Here, we perform pattern recognition based on the behavior of pedestrians and vehicles. For instance, a person looking around while walking or a car with its turn signal on is likely to change direction.
We conduct simultaneous and broad simulations, considering that other pedestrians and vehicles may also change their speed or direction. If we infer that continuing at our current pace might lead to a collision, we stop or change direction.
In this way, we make inferences about the unknown or future by combining the abilities of pattern matching and simulation. Vector converters and vector processors capable of performing pattern matching and simulation similarly function as devices for making inferences about the unknown or future.
Inference Techniques Using Pattern Recognition and Simulation
When the transformer model is viewed as a vector converter and vector processor, conversational AI created using this model also possesses the capability to make inferences through pattern recognition and simulation.
The reasoning ability of conversational AI can be considered a manifestation of this ability to infer through pattern recognition and simulation. Logical reasoning in language can be seen as a simulation where logical rules are applied to a logical model. Moreover, it is not only about simple and strict logical reasoning but also about simulations that apply real-world rules to conceptual models of the real world expressed in words.
It is precisely because of the ability to create these conceptual models and conduct simulations that AI can accurately analyze the plot and characters’ emotions in stories like novels. And the ability to generate coherent continuations of partial stories is due to the capacity to create and simulate models of the narrative world.
Additionally, such conversational AI can perform various types of intellectual tasks. Beyond conversations, AI based on the transformer model is also being developed to generate images and music.
Viewing the features of the transformer model as a combination of vector converters and vector processors, the diverse abilities of generative AI can be seen as applications of the inference ability resulting from the combination of pattern recognition and simulation.
In other words, not limited to the transformer model or vector converters and processors, models or mechanisms that possess advanced inferential capabilities through a combination of pattern recognition and simulation could replicate the various abilities of current generative AI when applied.
If this understanding is correct, the technological development of generative AI involves developing architectures that can realize advanced combinations of pattern recognition and simulation and implementing advanced inferential capabilities on these architectures.
From this perspective, the transformer model can be seen as an architecture that enables advanced combinations of pattern recognition and simulation, and a technology that has enabled the automatic implementation of advanced inferential capabilities through machine learning.
In Conclusion: Natural Language as a Tool for Inference
When reading technical explanations about generative AI, transformer models, and attention mechanisms, it is often described that these technologies are predicting the next word in a sentence or determining which words in a sentence should be focused on.
This explains to some extent how natural language processing capabilities improve. However, it is not easy to understand from such explanations why conversational AI has been able to acquire advanced reasoning abilities.
I think this is because the approach is reversed. As analyzed in this article, the processing architecture of the transformer model enables advanced inference through a combination of pattern recognition and simulation. The transformer model, with its attention mechanism and decoder’s iterative processing, enhances the traditional pattern recognition capabilities of AI with the added strength of simulation-based inference.
In essence, the transformer model is a technology designed to enhance the reasoning abilities of AI.
From this perspective, the understanding that improved conversational skills lead to enhanced reasoning abilities is reversed. It’s the enhanced reasoning abilities that have resulted in improved conversational skills.
This means that natural language processing capabilities are based on reasoning abilities. Moreover, natural language has a role not only as a tool for communication or information storage but also as a tool for utilizing and demonstrating inferential capabilities.
When conversational AI learns from the vast amount of text created by humans, it is not only learning words, grammar, and conversational patterns but also the way humans infer. It is because natural language has the capability to express the process of inference that conversational AI can mimic human inference from texts created by humans.