Demystify Large Language Model and Generative AI (Part 1) : A Family-Friendly Guide

7 min readJun 2, 2023

Introduction

Recently, I published a blog post about how to use traditional machine learning techniques to understand how large language models (LLM) make decision.

I was very excited about this and showed it to my mother the next day. As she skimmed through my post, her face lit up with a proud smile and said, “Son, this looks very professional, but I’m not really sure what you’re saying.”

Her honest feedback gave me a new perspective.I suddenly realized the depth of technical understanding required to embrace and explore this very new technology.

However, this understanding should not be limited to a select few. If we cannot explain the significance of these technologies and their working mechanisms at a high level, we’re missing the essence of the joy of discovery that we are so eagerly trying to share.

Large Language Models (LLMs) and Generative AI have revolutionized accessibility to knowledge for a vast majority of people. However, the intricate details of how these technologies work shouldn’t remain the exclusive domain of advanced data scientists. Rather, they should be accessible, comprehensible, and appreciated by all, thus fostering a more inclusive tech community.

Seeing how fast this field has grown in the last six months, I’d like to take a moment to talk about the big changes in this area. So, let’s take 10 minutes to make some of these tech ideas less mysterious. We’ll look at why this area is growing so fast and uncover the simple ideas that make these technologies work.

Disclaimer: This is an attempt to explain this complex topic in simpler terms, primarily for my family. Therefore, some parts might not be entirely accurate. Please let me know if there are any fundamental issues. Also, I would be happy to hear how you explain this to others.

Sense and Learn

Sense

Artificial neural networks and deep learning have indeed been around for decades, but you may be wondering why there’s been such an explosion of powerful language models appearing on your news feed in the past six months.

Understanding and generating human language — a field known as Natural Language Processing (NLP) — has always been a challenging topic in the deep learning arena. Curiously, in the last decade, we’ve seen much more progress in computer vision and image recognition than in language models. But why is this so?

To understand this, let’s take a look at two images:

What do you infer from these two images and how long does it take for you to understand them?

On the left side, in less than a second, you can identify a runner in a race, exhibiting a confident posture, sprinting on a street encircled by buildings. Indeed, that’s me running through Times Square during the NYC Half Marathon 2023.
On the right side is a paragraph describing the race from New York Road Runners. How long does it take you to read it? I’m a slow reader, so it’s about a minute for me. Regardless of your reading speed, it’s virtually impossible to absorb all the information in that text in 1 second, unlike how quickly you can interpret the picture.

This is understandable as the eye/brain response to visual stimuli is typically much faster than the process of reading. With the picture, you don’t even need to speak any language to perceive the runner’s joy. However, to process textual information, you need to know the language, possess some common sense, and read it line by line from left to right.

This is why the advancement of natural language processing has lagged behind computer vision — traditional learning mechanisms were designed to mimic exactly how we see and how we read, but there’s a considerable difference in efficiency between the two.

But what if we thought outside the box and processed text in the same way we perceive images? This question was brilliantly asked and answered by some researchers at Google Brain, who invented a magic box called ‘Transformer’.

The Transformer employs a mechanism known as ‘Attention’ to grasp context from a broader perspective, rather than processing line by line. This is similar to how you view a picture — you don’t inspect it pixel by pixel; instead, you glance at it and understand its content.

The Transformer is a genuinely disruptive technology that alters the way machines understand text.

The pioneering paper from Google Brain is called Attention Is All You Need. Interestingly, in today’s social media landscape, content creators strive to capture more attention from their audience, and breakthroughs in language models are also driven by this same concept of attention. Perhaps, ATTENTION IS THE PRIMARY PRODUCTIVE FORCE.

Learn

Now that we have a better method for understanding text, the next challenge is teaching our ‘magic box’ to learn.

Going back to the runner’s image, even though it’s a single frame, you can intuit that he’s passed the start line, experienced some highs and lows during the race, managed to put on a happy face for the picture even though it’s the last two miles of a half marathon, and likely completed the race (though this hadn’t happened when the picture was taken). How do you know all this? You’re essentially projecting that image over time.

Everything we experience is a series of events over time. A video is a series of images, text is a series of words, and music is a series of notes.

Consider the following sequence of images of me running in the Bronx. From the first three frames, you can probably predict my next action. Why? Because you’ve likely seen similar scenarios before.

Similarly, for the Transformer, if we expose it to more text, it might be able to predict the next word based on what it’s read before.

Let’s get Transformers to learn!

Talent and Hard Work

‘GPT-3 has more than 175 billion parameters’

You might have come across this statement in your news feed, but what exactly are parameters?

All Transformers are not created equal. They possess a fixed number of ‘dials’ within them to sense and learn. Think of parameters as the brain’s neurons. The more neurons an animal has, the larger its head tends to be, and the more intelligent it is likely to be.

Note: I use an elephant to represent GPT-4, not only because it’s the ‘elephant’ in the Generative AI space, but also because elephants are the only species with more neurons than humans. And this is just for illustration purposes — I’m not saying GPT-1 is as bad as a fish;

Consider the learning capacity of these animals throughout their lives. Their brain’s capacity sets an upper limit on what they can achieve. Thus, Transformers with more parameters are born with higher potential, or talent.

But is talent alone sufficient? No. Let’s consider twin baby boys — one learns very little, while the other spends a considerable amount of time in the library. Who would have more knowledge and potentially be more intelligent? Apparently, the latter.

How many books can a Transformer learn from? Theoretically, it can learn from everything that ever exists on the Internet. Is this feasible? Perhaps, but it would require a great deal of time and computational resources (especially GPUs) to properly train a Transformer.

In theory, we could spend our entire lives at school, but usually, we graduate in our early twenties with a good enough understanding of the world to embark on our individual life paths. The first 20 years set up the foundation for most people’s lives. That’s why these pre-trained large language models are also referred to as ‘Foundation Models’ — even though it hasn’t acquired all existing knowledge, it’s sufficiently knowledgeable to leverage what it’s learned to discover new information.

Specialization

‘It(Morgan Stanley) has “fine-tune trained” GPT-4 on these issues with the 100,000 documents as a training corpus’

But what exactly does fine-tune mean in this context?

Much like everyone’s professional experience, the knowledge we gain from our work and professional networks significantly diversifies compared to what we learned in college. Unique information, experiences, and interactions collectively form a specialized understanding of a specific domain. My experience with financial services and AI/ML is quite different from that of my classmate working as an art designer. Fine-tuning is essentially a tailored career path combined with ongoing education and industry experience that shapes who we are in our current positions.

And how do we continue to improve? Keep fine-tuning!

Summary

None of the concepts mentioned above were created by me. There are countless talented individuals in this field who continually create new, impressive innovations that captivate anyone interested in this realm. I feel incredibly humbled that I’m living in an era characterized by profound and disruptive transformations in how we perceive, think, and act.

I will continue this series with a few more posts, delving into topics such as how can ChatGPT chat, what’s ‘zero-shot’, and other interesting aspects of Generative AI.

© 2023 Ray Mi. All rights reserved. Unauthorized use or reproduction of this blog post without express written permission from the author is strictly prohibited. Information is provided as-is, without warranties. Views are those of the author alone.