Let us talk about ChatGPT
My point of view as mathematician and data scientist
Thanks to rapid progress in artificial intelligence, we have entered a new era of technology and ChatGPT is the one that sits at the center of this new world. In this blog post, I am going to talk about how I understand ChatGPT as a mathematician and data scientist. Without going deeply into the details of models or algorithms, I will be focusing on the following three questions:
- Why ChatGPT is more powerful than most existing AI chatbots?
- Can ChatGPT replace traditional search engines?
- Is ChatGPT as intelligent as human begings?
ChatGPT vs other chatbots
Large Language model
ChatGPT, along with other chatbots, is nothing but a conversational agent or AI assistant where Large language models (LLMs) are embedded, that is, generative mathematical models of the statistical distribution of tokens in the vast public corpus of human-generated text, where the tokens in question include words, parts of words, or individual characters including punctuation marks. We can say that what an LLM does is to output the most likely following token by a given token. A short remark here is that it has become commonplace to use the term LLM both for the machine learning model itself and for the interface in which the model is embedded such as ChatGPT and its ancestors.
Let us consider an example. Imagine that we ask an LLM: Who won the Nobel Prize in Literature in 2022? The answer would be Annie Ernaux. But for the model, what we are asking here is rather: for the given sentence “ Who won the Nobel Prize in Literature 2022”, what is the most likely following token? A good reply to this question is “Annie Ernaux” from the distribution that the model has learned before.
Transformer
People are asking: among all the chatbots able to do natural language processes and corresponding prediction, why ChatGPT stands out? I would argue that OpenAI would mostly be thanks to the appearance of Transformer, a deep learning model, first introduced in 2017 by Vaswani et al of Google Brain.
As with most language models, Transformer consists of an Encoder and a Decoder where the former takes the input sequence and maps it into a higher dimensional space and the latter maps the encoded vector into an output sequence which can be another language, symbols, etc. What makes a difference is the so-called attention mechanism that each encoder and decoder layer makes use of. The attention mechanism looks at an input sequence and decides at the same time which other parts of the sequence are important. To get more knowledge about the Transformer, I recommend this Blog.
The birth of the Transformer has brought the LLM into a new era because it allows parallel processing and is much faster than any other model with the same performance. Besides, I would like to highlight its in-context learning ability, that is, it can learn a new task from just a few examples, without the need for any new training data.
ChatGPT vs search engines
Lossy compression
Let us consider now another question: why is ChatGPT seems more intelligent than traditional search engines, and even make users feel that they are talking to real human beings?
I would start this chapter by humbly citing a story given by Ted Chiang in his Blog “ChatGPT Is a Blurry JPEG of the Web” which I appreciate so much and have endless recommended to everyone who would like to get a knowledge of ChatGPT around me.
This is a story of Xerox, a German photocopier company. In 2013, someone used a photocopier to copy a house floor plan. There were three rooms in the original plan, where the room sizes were marked as 14.13, 21.11, and 17.42 square meters. However, all three rooms were marked as 14.13 square meters in the obtained copy.
What was the problem? The problem came from the step where the photocopier compresses the file. When Xerox photocopier compresses images, it used a lossy algorithm, that is, during the compression process, some pixels were lost and we would never get them back after decompressing the image.
This algorithm is very common when processing photos and videos: Because most of the time, it is difficult for us to notice whether images or pieces of music were perfectly reproduced. The photocopier, when compressing the plan, noticed that the room sizes were labeled very similarly, so it only stored the area number 14.13 for one of the rooms, and used the same label for all three rooms when printing the floor plan.
Although the compression algorithm will cause the loss of information, it is not always a bad thing, at least not for ChatGPT. On the contrary, the loss of some information makes it look more human. Remember that LLM is essentially a statistical model, that is, for a given token, it tells the most likely following sequence. Its encoding process, like compressing pictures, also throws away some information that it thinks can be recovered with statistics it has learned from a dataset. Unlike search engines, what ChatGPT gives is never the original text, but the text that has been encoded and decoded by the model.
Think about a literature class where the teacher asked: What was written in Notre Dame de Paris by Victor Hugo? Student A repeats a paragraph from the original textbook. Student B says: The book focuses on the unfortunate story of Quasimodo in 15th-century Paris. Although B did not recite the full text and gave a lossy version instead, we would find him smarter and it seems that he understands better the book than A.
The main reason why we feel that ChatGPT does have a brain, is that it never copies the text from the web, but changed the words and sentences of the original text because of its lossy encoding and encoding process.
Be careful
ChatGPT, as a chat tool, has a single output value instead of all search results as traditional search engines do. Although we are exempted from the steps of manually filtering all information a search engine gives, there is still a big problem to be solved: how to ensure the correctness of its output? Users of ChatGPT have already reported some bad experiences: when asking it to recommend books, we sometimes get a list of books that do not exist at all. However, because of the lossy process, even if it spreads rumors, we would still feel that it is telling the real thing.
It is undoubtedly risky to let ChatGPT replace traditional search engines before ensuring the authenticity of information sources.
ChatGPT vs humain beings
“knowledge” and “belief”
ChatGPT is so powerful, so versatile, and so useful that often makes us forget the fact that a bare-bones LLM doesn’t “really” know anything because all it does, at a fundamental level, is sequence prediction.
People would say that it is more than possible that LLMs like ChatGPT can develop higher-level applications including “knowledge” and “belief” during the learning process, even though LLMs are essentially only prediction models based on statistics that they have learned. Neural networks can approximate any computable function to an arbitrary degree of accuracy. So, given enough data, computing power, and better-designed models, perhaps stochastic gradient descent will discover such mechanisms.
Pearl’s Ladder of Causation
However, I still want to point out that, there is an essential difference between LLMs’ and a baby’s learning process.
Judea Pearl, one of my favorite computer scientists, frequently refers to what he calls a “causal ladder”, a hierarchy of three types of problems of increasing difficulty: (1) association (prediction) , (2) intervention, and (3) counterfactuals. This is exactly how a baby learns: A baby who hits his head on the table corner will feel the pain and he will conclude that hitting the table corner is the cause of the pain (association). After figuring out the fact, he would probably try to lightly hit the table to see if he would really feel the pain (intervention). Eventually, he’ll know that: if I don’t want to get hurt, I should walk carefully to avoid the table corner (counterfactuals).
Despite its powerful algorithms, today’s ChatGPT still stays on the first level: it can easily understand the correlations of the given dataset but cannot understand the causal relations behind it. From this point of view, I belive that it still impossible for ChatGPT to make a decision as human beings and to interact with any external reality.