Comparison of Generative AI and Computer Processing Architectures
System engineering and AI technology are both supported by computer science and software technology. However, I feel that there is a subtle difference between the two as technical fields.
As a system engineer, I perceive conversational AI and generative AI as a type of information processing system. For this reason, I want to understand macroscopically the overall picture of the processing, the types of information stored inside, and the methods and meanings of their use.
In this article, from the perspective of a system engineer, I would like to deepen my understanding by macroscopically grasping the processing of conversational AI chatbots and generative AI agents.
First, from the perspective of an information processing system, when compared with normal computer systems, current conversational and generative AI seem to automatically perform program generation and its execution when conducting conversations. The processors, programming language processing systems, and program generators used there seem to be automatically generated internally during the learning phase of AI.
However, the processors and programs designed there are different from the computers we usually use. Analyzing these differences reveals a flexible and powerful processing architecture.
Now, let’s look at this in more detail.
Conversational AI Processing
In the current conversational AI, when macroscopically capturing the processing flow of the transformer, the encoder encodes the inputted text and generates encoding once, after which the decoder repeats the process of outputting one word.
In one processing step of the decoder, the next word is output based on the encoding and internal state. During this, the internal state is also updated.
I perceive this flow as automatic program generation and a Neumann-type computer that processes this program.
In my view, the encoding generated from the input text in the encoder part corresponds to the program that operates in the decoder part. Therefore, the encoder part can be said to be an automatic program generation device.
The decoder part repeats the processing steps of processing this program and updating the output and internal state. Therefore, the internal state is internal memory, and the decoder part corresponds to the computational part of a Neumann-type computer with a processor and internal memory like a CPU or GPU.
Learning Mechanism of Conversational AI
Conversational AI becomes capable of conversing with humans through a learning process. It reads a large amount of text and learns through the mechanism of the transformer, which determines the parameters of the internal neural network.
Seeing this learning process in my perspective, the automatic program generation device and processor are automatically created during the learning of conversational AI and are retained as a group of parameters of the internal neural network.
In other words, the learning mechanism of conversational AI can be seen as an automatic design mechanism for the program generation device and processor.
Prompt Engineering
As a technique to masterfully use conversational AI, there is a technology called prompt engineering. It’s a technique for giving effective instructions to conversational AI and advising it on ways of thinking so that it can generate more desirable responses.
When I think about this from my perspective, it resembles the task of providing design information to an automatic program generation device. In actual software development, when asking a programmer to create a program, you might instruct them on the processing flow or provide examples and samples as design references.
In prompt engineering, it’s important to understand the characteristics and quirks of the conversational AI being used. This is similar to needing to understand the features of the programming language, libraries, and system configuration used when providing design information to a programmer. Essentially, it’s knowledge of the target platform’s architecture.
Thus, prompt engineering can be seen as the skill of understanding the architecture of the conversational AI as a target platform and issuing appropriate design instructions.
However, the work of a prompt engineer is not just about instructing programmers. This is because, in conversational AI, the program is not the final product but merely an intermediate product used to achieve the desired processing result. Prompt engineers work to skillfully create programs internally in the AI’s program generation device to achieve the final product.
Processing Architecture of the Decoder Part as a Neumann-Type Computer
When viewing the decoder part of the transformer used in conversational AI as a Neumann-type computer, it has a significantly different processing architecture than that of normal computers.
Normal computers have an architecture that processes programs sequentially from top to bottom. For this, they have an internal memory value called a program counter, indicating which part of the program is currently being processed.
Normally, this program counter is incremented by one with each processing step. In the next step, the process written at the location indicated by the program counter is executed, and the program counter is incremented again. This is repeated to execute the processes written in the program sequentially from top to bottom.
The decoder part of the transformer used in conversational AI does not process programs from top to bottom in this way.
In one processing step of the decoder, it performs computational processing of all nodes in the neural network while referencing the entire program, updating the internal state. Note that the transformer’s decoder also references previously output text, but here we’ll consider that as part of the internal state too.
In a normal program, only one processing instruction of the program is referenced in one processing step. In contrast, the transformer’s decoder references the entire program in each processing step. This is a significant difference.
However, it’s not that the entire program equally influences the processing at all times. The internal state changes with each process, and depending on this internal state, parts of the entire program that strongly or weakly influence the processing change. As a result, the same process is not performed each time, and different processes are carried out in each processing step.
Simulating Normal Computer Processing
Using this basic processing architecture, the parts of the program that influence the processing results (output and internal state update) and to what extent they do so, are determined at the time of processor design. In other words, this is automatically determined during the learning phase of conversational AI.
Let’s assume, for instance, that the processor is designed such that if the program follows a certain pattern, it includes a part like a counter in its internal state. In each processing step, only the program part indicated by this counter influences the processing result. Additionally, the internal state’s counter increases by one at the end of each processing step.
This counter would then act as a program counter. Therefore, if such a processor design is implemented, the decoder part of conversational AI would have a mechanism for processing programs sequentially from top to bottom, just like the computers we are familiar with.
Moreover, as I omitted earlier, by incorporating processes that change this program counter mid-processing, it becomes possible to implement conditional branching and loop processing, just like in normal programs. If this is achieved, it means that any process that can be realized in a normal program can also be realized within the decoder part of conversational AI.
Thus, the processing architecture of the decoder part of conversational AI essentially encompasses the processing architecture of a normal computer. Depending on the design of the decoder part, it is even possible to completely simulate the processing of a normal computer.
Flexibility as a Processing Architecture
Therefore, it is inappropriate to understand the decoder part’s processing architecture as simply different from that of a normal computer. It’s more appropriate to understand the normal computer’s processing architecture as a special limited pattern within the transformer’s decoder processing architecture. Conversely, it can be said that the decoder part possesses a more flexible processing architecture than that of a normal computer.
Using this flexibility, for example, it becomes possible to realize parallel processing by having multiple program counters, each pointing to different parts of the program simultaneously.
It is also possible to vary the intensity of the effects in parallel processing. The main process can have a strong influence, while subsidiary processes can have a more subdued effect. It is possible to switch main and subsidiary processes mid-way or smoothly or abruptly change the degree of influence.
Processes such as varying the progression of counters in parallel processing or changing the rate of counter progression mid-process are also feasible. These resemble being able to tune the processing architecture while processing. Current software system technologies, like software-defined networking or system orchestration, allow changes to system infrastructure configurations through software processing. The transformer’s decoder part can be said to realize a similar concept, a software-defined processing architecture, which allows for flexible modification of the processing architecture.
Of course, this is within the range of capabilities the decoder part’s processing architecture can possess. Whether such capabilities are actually utilized depends on the processor’s design. Therefore, if conversational AI is trained to perform such processing, it should be theoretically possible to create processors that utilize the flexibility of this decoder part’s processing architecture for parallel processing and in-process adjustments of the processing architecture.
Harnessing Cognitive Functions for Strength
Additionally, since the decoder part is implemented using neural networks, it also encompasses the ability to classify, recognize, and probabilistically evaluate complex information. If various classifiers and recognizers are formed internally during the learning process, it becomes possible to use appropriate classifiers or recognizers at any desired moment during processing in the decoder, and utilize their results. This implies that the architecture can automatically realize what is typically done when developing conventional classification-type AI systems: calling and using AI created as classifiers within human-made software systems.
Therefore, the architecture is not just about flexible parallel processing computing; it’s also a powerful architecture capable of forming complex cognitive systems by combining cognitive functions realized through neural networks.
In Conclusion
When considering the mechanism of the decoder part of the transformer used in conversational AI, it becomes clear that it possesses a much more flexible and powerful processing architecture than traditional computers.
As a system engineer and a programmer, I would be at a loss if asked to write programs using this flexible and powerful architecture. Due to its excessive flexibility and strength, it’s completely unclear to human engineers how to effectively utilize this architecture to design systems capable of sophisticated processing like that achieved by conversational AI.
This can be likened to imagining a robot with a hundred arms. Even for someone blessed with great physical ability and robot-operating skills, it’s hard to imagine being able to operate a robot with a hundred arms to its full potential. The same applies here. Just as there are robots suitable for human operation, the processing architecture of current standard computers is user-friendly for us human engineers.
Therefore, machine learning methods like AI are effective in utilizing this architecture. With AI, even if it has a hundred arms, with sufficient training, it could be skillfully utilized. Similarly, conversational AI has automatically designed a system capable of human-level conversation and reasonably sophisticated chain reasoning, utilizing an architecture too flexible and powerful for human engineers to handle.
It’s unclear how much of the potential of the flexible and powerful architecture of the transformer’s decoder part is currently being utilized by conversational AI. Perhaps only about ten out of a hundred arms are being used, and with further training, the rest could be effectively utilized. Just as humans are said to use only a fraction of their brain’s capabilities, a similar analysis might apply here.
In the future, as conversational AI advances through reinforcement learning by exchanging texts generated by AI, rather than just texts created by humans, deeper development of abilities utilizing this architecture might be realized. Moreover, there is room for the evolution of the architecture itself. Along with the evolution of computer hardware, the evolution of architectures and learning methods will likely lead to an exponential increase in AI capabilities.