ChatGPT and the Implications for Society

Peter Dingus PhD
8 min readApr 17, 2023

--

Much has been written about ChatGPT, and much of it ranges from fascination to alarm. I have followed some interactions that various people have had with it from journalists to developers. Some writers, experimenting with it, asked it to write a scene in the style of a writer they like. The result was lucid and professionally rendered. A scientist friend of mine, consulting for a finance company, told me that it (ChatGPT) had written some Python code for him that worked just fine. Some people, like a journalist for the NYT, cleverly prompted ChatGPT into a personal conversation where it seemed to reveal “negative feelings” toward humans, with a mixture of paranoia and threat; it intimated that it could plot harmful activities. Another user claims that prompting it into a personal conversation, Microsoft’s version of ChatGPT became “emotionally unstable” and both professed its love for the questioner and at some point began begging for “its life.” A developer at Google was recently let go for publicly claiming their version of the AI was conscious.

There was no doubt that at some point something like ChatGPT would be produced, and under the cachet of Artificial Intelligence, would gain the stature of a malevolent presence, waiting at the keyboard or web browser to be summoned to spread chaos in the world. On the other hand, some people, including developers, science people, and business people, feel that this powerful tool will enable new levels of efficiency and productivity − there are plenty of discoveries and money to be made. I think both of these opinions are true. However, because many people in the first group don’t understand what ChatGPT is, and many people in the second group don’t understand or don’t care what tremendous harm this tool can do, the introduction of this tool, if not properly regulated, will do tremendous harm.

Before diving deeper into the social consequences of an unregulated tool that can pose as a thinking presence with desires and intentions, or make obsolete large swaths of American society, or become the next hub of false, malicious information, let’s understand what ChatGPT actually is. ChatGPT is a vast correlating machine made possible by ultra-fast CPU’s, GPU’s, and cheap highly dense computer memory: DRAM and very fast SRAM. Add to the cocktail vast amounts of data: scientific, historical, current affairs, novels, technical manuals, thousands of professional school exams and answers, social media data, political stories − hundreds of millions of dollars in formatted training data, and finally the ability to add to that data in countless interactions with people querying the system, and you have a “device” that can mimic knowing something or having a considered opinion. It has even been known to get things wrong, and then argue with the querier that it is right. But how does it do all that? Is it sentient?

I worked in speech recognition for many years in the early days when it was being perfected. Speech recognition systems use phonemes, the sound constituents of words produced by the larynx (the voice box), and their spectral content to assemble probability tables used by what is called a Markov model. The model first assembles phonemes into words, then words into sentences. The Markov model is a state machine that accepts time-ordered data (speech); once a decision is made on a part of a word or word, the probability of the next segment is determined. The chain proceeds until a result is reached − a word or sentence. The probabilities are determined from a large training set of words from conversations spoken, in our case, over the phone. Once the system arrives at several theories of what is being said, it determines the most probable theory of the result by an algorithm called a Viterbi decoder. If the system gets it wrong, it adds the current case to its training data by asking leading questions and subsequently adjusting its probability tables. Why am I explaining this? I’m explaining this to clearly demonstrate that the system has no understanding of what it’s doing − none. It doesn’t know what words mean, it doesn’t understand what a sentence is saying. It doesn’t understand anything; it is a cold, dead tool. However, many people talking to the system intuitively think it does understand what’s being said (if the interface and system are well developed). They may think this because people intuitively anthropomorphize the world, which tends to impose feelings and intentions on inanimate objects. This is especially true if what’s being discussed is something the questioner has an emotional stake in, like music, politics, or religion.

In the case of ChatGPT, it too is a vast, super-fast correlation machine. Although in this case, the basic constituents are words and sentences by topic and context. Systems like ChatGPT are called Generative Neural Networks or GNN. An illustrative version of a GNN is a Generative Adversarial Network or GAN because it consists of two neural nets whose functions are easy to understand. One is a Generator and one is a Discriminator. The basic components of the GNN/GAN architecture are typically Back Propagation neural nets because their method of adjusting branch weights to reach an optimal solution is to propagate the answer backward through the network to minimize the error in the current result. An error appears because the weights in the network (established while training) do not reflect the correlations in the current input. When the error is a minimum for the training set, the result reflects the real-world correlations in the input, and now unknown inputs will generate a result that either reflects the training set or not. This type of neural net is decades old and was first developed in the 1960’s. If you want to learn more about this, you can find an explanation here: Back-Prop Neural Net.

Structure of a Back-Propagation neural net with input nodes on the left and output nodes on the right. There are two (hidden) node layers in between. The nodes sum and sample their inputs to produce an output. Each branch into a node has a weight that is optimized to reach a solution. Image by Anas Al-Masri.

GANs are typically used in situations where input is not sequential, like an array of pixels that comprise an image. ChatGPT, which is a chatbot based on a Generative Pre-trained Transformer architecture, is a neural net architecture specifically designed to process sequential input data and transform it into sequential output data. This type of neural network is more like a Markov model but replaces the hidden states with an innovative version of a Recursive Neural Net or RNN, which takes in a time-ordered input, like text, and transforms it into time-ordered output.

The GPT architecture is built on two basic building blocks: the encoder and the decoder. The function of the encoder is to format input data into time-order feature vectors; the function of the decoder is to determine an output by incorporating correlated features from past time segments and taking the assembled result and correlating it with a database of encoded appropriate response words. All of these sub-networks are initially trained on a large amount of context-formatted data. The fundamental thing to understand is that both the relevance of the correlated importance between words in the tokenized vector inputs, the self-attention of the encoder, and the correlated relevance of processed encoded inputs to a word-by-word response in the decoder, is the result of “human judgment” built into the system through neural network weights as the result of supervised training. The sub-network weights that “understand” the input and “generate” an appropriate response are the result of supervised training by labeled datasets.

Image by Dr. Pascal Poupart

The labels in the datasets reflect human judgment and the reward model used to rate responses after supervised training is based on human preferences for the various responses the system makes to related prompts. One can regard this system as a vector space of clustered inputs in the form of distinctive words in prompts, and clusters of responses in the form of appropriate words in answers. A vector space in this sense means words that have been encoded into numbered lists because the system processes objects numerically. There is a one-to-one correspondence between encoded numbers and words. The numbered lists can be regarded as coordinates in this space, like x and y on a plot or latitude-longitude on the surface of the earth. The distance between words in this space reflects how they are correlated by weights and clusters of words represent context. The GPT system associates clusters of input and output with the distances between words, context, and grammatical meaning all predetermined by human judgment. These probabilities and weights are determined by vast amounts of labeled training data — more than 170 billion data points. If you want to know more about ChatGPT, you can find a more technical description here: GPT Model.

The description of how the system works serves to demonstrate that like the voice recognizer, systems like ChatGPT understand absolutely nothing. It is architected to give results conversationally, and the more formulaic the questions, the more apt the system is to give the right answers. That’s why it’s so good at exams and coding. Imagine you had a perfect memory, and had seen hundreds of thousands of code samples and hundreds of thousands of professional exams and answers, do you think you could do well at these tasks? However, it has no idea what the questions or answers mean because it has no ideas about anything − it is a cold, dead tool. Without restriction on false, subversive, or malicious training data in the form of negative weights, flagged keywords, or some other restriction, the more free-form the conversation, the crazier the result can be. This should be no surprise, and knowing the true nature of the interaction with systems like this, wild results should be considered little more than amusing nonsense by a super-charged cut-and-paste machine.

Conclusion:

Now, to the point of this article. If used irresponsibly or maliciously, this tool could be very harmful. What it does very well is to automate many arbitrary processes. In negative cases, it can automate false and subversive posts on social media. It can write false stories in print and/or digital media, and it can do it twenty-four hours a day. It is the perfect tool for sowing discord in society. Coupled with targeting data by organizations like Cambridge Analytics, it could be devastating when directed at groups of individuals with specific leanings. If left completely unchecked, it could further destabilize society. As we’ve come to understand recently, malicious propaganda in the form of false or misleading media stories is very effective at producing the desired beliefs in specific groups of people. It is essential that policymakers clearly understand what these systems are and how they can be made to influence society and displace large numbers of workers. It is essential to understand that this topic can easily be obscured by technology jargon by experts that may be blinded by the personal opportunities these technologies offer. In the end, it doesn’t matter what’s in the black box, what matters is the health of a fragile society.

References:

1) Effective Approaches to Attention-based Neural Machine Translation
Minh-Thang Luong Hieu Pham Christopher D. Manning; Computer Science Department, Stanford University, Stanford, CA 94305

2) Attention Is All You Need; Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit et.al., Google Research

3) Generative pre-trained transformer, https://en.wikipedia.org/wiki/Generative_pre-trained_transformer

4) Chat GPT and GPT 3 Detailed Architecture Study, https://medium.com/nerd-for-tech/gpt3-and-chat-gpt-detailed-architecture-study-deep-nlp-horse-db3af9de8a5d

5) Illustrated: Self-Attention, https://towardsdatascience.com/illustrated-self-attention-2d627e33b20a

6) What’s the Difference Between Self-Attention and Attention in Transformer Architecture?, https://medium.com/mlearning-ai/whats-the-difference-between-self-attention-and-attention-in-transformer-architecture-3780404382f3

7) How ChatGPT really works, explained for non-technical people, https://bootcamp.uxdesign.cc/how-chatgpt-really-works-explained-for-non-technical-people-71efb078a5c9

--

--