Will ChatGPT Steal Your (UX) Job? — Part 1

Introduction to AI/ML for UX designers

10 min readMar 10, 2023

Snap from the BUX meetup (at Fearless HQ), photo credit: Richard Gray

This is part one of a two part article that together summarize a recent Baltimore UX presentation. This part introduces UXers into the world of Artificial Intelligence (AI)/Machine Learning (ML)/Deep Learning (DL). If you’re already well informed on AI/ML/DL— skip to Part 2 where I discuss when and how UXers should and should NOT use DL Models in their products and/or in their every day work.

BUX also made the presentation audio/transcript and slides available for anyone. You may use the slides for any non-commercial educational purpose (but please give attribution).

The information presented here can be found in much more detail on a variety of source. In preparation for the presentation, I struggled to find the right balance between giving too little and too much information for a UX audience. I would really appreciate feedback about how useful was the information presented in this article.

Disambiguation of terminology

The way most introductions to AI/ML/DL start (or should start) is with a visualization of how the different terms/jargon relate to each other.

diagram which describes how AI, ML and DL relate to each other. — Source: https://www.simplilearn.com/tutorials/artificial-intelligence-tutorial/ai-vs-machine-learning-vs-deep-learning

Artificial Intelligence (AI) refers to the ability of machines to imitate human abilities or to represent human knowledge. While usually digital, AI just needs to come from intimate objects. Geek out on the history of AI.

Machine Learning (ML): This is a subset of AI. It refers to the ability of system to automatically learn and improve. The basis for the field predates digital computers. Geek out on the history of ML.

Deep Learning (DL): This is a subset of ML. It relies on complex artificial networks that represent data in increased complexity. Geek out on the history of DL.

💡 Practical tip: AI, ML, and DL are used interchangeably in common discourse. However, most AI solutions that have gotten recent media attention (including ChatGPT) are DL models.

Difference between ML and DL

The difference between ML and DL is subtle. The key difference is that DL models extract features (or categories) automatically, while ML models generally rely on human experts to help identify the features that the model should focus on. Another way of thinking about the differences is that the deeper the model is, the less guidance from humans is needed. This automatic extraction of features is accomplished through the use of Artificial Neural Networks (ANN). These networks contain “hidden” (deep) layers that do the feature extraction.

difference between Machine and Deep Learning models. The imporartant part is that Machine Learning has human driven feature extraction, while Deep Learning has the features extracted by the model. — Source: https://www.quora.com/What-is-the-difference-between-deep-learning-and-usual-machine-learning

💡 Practical tip: Regardless of definition, DL models WILL have some level of manual human input and very often models that are called “ML” will have some automatic feature extraction.

Artificial Neural Network (ANN) and their Complexity

Artificial Neural Networks are modeled on biological neural networks— also known as brains. Both artificial neurons and biological neurons share many similarities (see diagram below): inputs (dendrites in biological neurons), summation/calculation/activation function of input (nucleus and cell body) and outputs (axon and axon terminals).

visualization about the similarities between biological and artificial neurons — Source: https://www.datacamp.com/community/tutorials/deep-learning-python

Each individual (biological or artificial) neuron contains very limited amount of data. However, the fact they connect to many other neurons (and these neurons connect to many other neurons) leads to an emergent property (often called intelligence).

The complexity of ANN has been increasing by orders of magnitude in recent years. It is now approaching the complexity of the human brain. For example our brains, have about 85 billion neurons, that together have about ~150 trillion connections. In ANN, parameters have a similar function to biological neural connections. The advances in training algorithms and GPU computation has lead to some staggering numbers: GPT-3 (first released in 2020) has ~175 billion parameters. GPT-4 (expected to come in 2023) is estimated to have ~100 trillion parameters. 🤩

⚠️ Important: ANN are NOT biological brains. Their complexity is different. This means that having an ANN that has 100,000 trillion parameter would NOT automatically result an AI that match or exceed the intelligence of an average person (in the ways we quantify intelligence) .

How Models Learn (ELI5 version)

You can write many articles about just how ANNs learn. The topic is complex but the common thread is that these models learn through a feedback loop that gradually improves the models. To illustrate this type of a feedback loop lets consider the following simple scenario:

Mishka is a hound.

2. Let’s try to train him to NOT bark

3. We introduce a stimulus (an input) — this cute dog walking by.

4. Mishka decides whether to bark or not… there is a lot of calculation that happen in his brain. Different parts of his brain give signals to either bark or not bark. Interestingly, in this scenario he is EQUALLY likely to bark (50% of the time) as he is to not bark (50% of the time).

5. Let’s assume: this time (through chance) he decide to NOT bark. This is our “model” output.

6. The NOT barking is the our desired behaviour. It matches what we think the model (Mishka) should be doing when the input is a dog walking by.

7. We give Mishka a treat (a reward) for being “correct”.

8. The reward changes Mishka’s brain. It makes parts of the brain that led him to “decide” to not bark be a little “closer” to each other. This in turn makes it more likely that in the future that when a dog walks by he will NOT bark.

The same training is repeated with different dogs walking by… And Mishka gradually learns not to bark nearby dogs… Or so we hope. *wink*

Introduction to Large Language Models

But Boris… what if I want my DL model to do a little more than just decide whether to bark or not bark?

Well you’re in luck. Because Large Language Models (LLMs) are having their day in the sun. LLMs are a special of type of DL models that specialize in understanding natural language.

These models are trained on VERY large text datasets. For example GPT-3, one of the largest LLMs, was trained on about 45 TB of text data. These models are trained through advanced training methods such as Masked Language Modeling (MLM) and Casual Language Modeling (CLM). The above article gives a lot of detail about how they are trained. But, to illustrate this, lets consider the following example:

A model is given the following sentence: “Snow [______] are expected tomorrow in Eastern Canada”
It is then asked to predict what word (token) is likely to be in the hidden part
The model might predict words like, 1) storms, 2) showers, 3) angels
The feedback loop gradually improves the model to become better at predicting the missing word (token)
As the model gets better at predicting these words/tokens… it gains semantic knowledge — knowledge about the world
This semantic knowledge is derived from the fact that the model uses context (ie the entire sentence) to learn that “snow” is associated with “storms” and “showers”… but also that “snow” is associated with “Canada”… it will eventually even learn that “showers” is associated with “are”, but “shower” is associated with “is”… You can imagine how powerful this can be on aggregate!

It’s worth noting that LLMs come in a variety of types, sizes and specializations. Some are both much faster to train and execute (ie get a prediction) because they have much fewer parameters and are trained on much less data. Some are “heavier” but contain a lot more knowledge.

LLMs are used for a variety of tasks (people are finding different creative ways to expand this all the time), these include:

Sentiment analysis
Summarization
Answering questions and text generation
Translation

What is ChatGPT

There is one LLM that is currently getting the majority of our attention: ChatGPT. It is a specially trained/fine-tuned version of OpenAI’s GPT-3. The fine-tuning is special, in that optimizes the output to be conversational. It was trained by having humans teach it how to best act as a “chatbot”. This training method is called Reinforcement Learning from Human Feedback (RLHF). During the training process people pretended to be both a human and a bot. This process was used as scaffolding; allowing it to learn how to generate output that is both conversational and acceptable to a real person. OpenAI wrote a brilliant article about how ChatGPT was trained.

One advantage of this training method is that introduced a lot of guardrails into ChatGPT. These guardrails are “knowledge” of the type of questions that ChatGPT “knows” it should not answer (because of the harm that might be caused by saying racist, illegal, untrue things). It also learns how to guide the conversation back towards fulfilling a task (this allows the conversation from getting stuck). More guardrails are constantly being added into the model (from the experience of the millions of people who are interacting with it).

All in all, ChatGPT sound almost human. You cannot help but be impressed when you converse with it. This conversational nature, combined with the immense information it has about the real world (from one of the most advanced versions of GPT-3) has created a lot of (justified) hype.

I believe that conversational nature does the following:

Helps the model give better responses because (in real conversations) we give context in smaller chunks — contrast this with a single query where you might struggle to give the full context of what you are asking about.
The conversational nature lends to simpler questions asked by the user
If an answer doesn’t quite makes sense the question can be easily refined

When combined, this helps with the perceived performance that a human judge will give to the model output.

Strengths of DL Models

Remember that ChatGPT is just one example of a DL model. However, all DL models have some common strengths/advantages.

Given sufficient data and the right use cases, the performance of DL models is generally very GOOD. In many cases it approaches human performance.
DL models don’t require human experts to explicitly program or identify specific features. For example, a DL Real Estate model would be able to learn from the data that the distance of a house from nearby school is an important factor in the price of a house. You wouldn’t need a Real Estate expert to inform the model of that fact.
DL models can learn largely without human intervention. This means that these models can “economically” scale and consume very large amount of data. (it is just not practical for humans to manually annotate terabytes of text data — because 1 TB=75,000,000 pages of text)
DL models can learn all type of data. They can take in and learn structured (ie words) or unstructured (images) data.
DL models can be tuned for specific (but similar) tasks. This means that the training (work) that you have done with one model will provide value for other models.

Weakness of DL Models

The strengths come with some very important caveats. Let’s go through some of them:

DL models require immense amount of good data (what is good data is a very complex topic, but think of it as reliable, representative, and unbiased). If you lack sufficient data, than simpler non ANN models or even rule-based heuristics might be much better for the task.
A DL model will be just as “good” and just as “bad” as the data that is used to train it. If the data has inherent bias, so will the model. We need to be extra careful to select training data that doesn’t perpetuate biases/systemic inequalities.
Training these models takes a lot of computational power… There are financial and environment costs for this type of training. As the amount of data that is used to train the models grows (current growth rate is exponential!), so will the computational cost.
It is challenging to know when you should stop training the model. You need to to avoid over-training the model. When you over-train, the model becomes only good at solving problem it has already seen. This is also known as an overfitting problem.
DL models are opaque. It is difficult to understand why it comes up with a certain output. It is often unclear what part of the input it found “important” in making a decision. This challenge will become increasingly important if (when) we give the ability for models to make important decisions (ie not just film recommendations, but financial or health decisions).
DL models are not humans. When they are presented with problems that are very different from its training data they often struggle. This inability to to solve problems outside of it’s domain of expertise is known as Brittleness… for example: if you attempt to train a self driving car, the model will really struggle with problems such an oncoming car driving the wrong way (or an elephant on the highway 🐘).
The DL models are biased towards giving a response or making a prediction. This results in a phenomena known as hallucinations. DL models will sometimes give responses that are simply not true. The issue is actually more challenging in LLMs like ChatGPT because the these hallucinations feel (on first impression) as true statements because of how good they sound.

Here ends the first part of the article. It was meant to give brief introduction into the world of AI/ML/DL. In the part 2 we discuss how UXers should apply these technologies into their professional lives.

Author note: Boris Volfson is an employee of Nuance (a Microsoft company). The ideas and thoughts in this writeup are his opinions (they do NOT represent any official position of Nuance or Microsoft).