Unveiling the Intuition behind LLMs and Gen AI : Demystifying the Hype

Published in

Intuition Matters

8 min readNov 2, 2023

ChatGPT has reached a 100 Million users in just 2 months, faster than any other social media app. This is a remarkable achievement, that raises the question, whether ChatGPT is a passing fad or a transformative new technology that is here to stay.

My previous article NLP by Intuition talks about understanding text analytics with intuition, and how text analytics has become more sophisticated and powerful over the years with the development of contextual embeddings, but based on the foundations on NLP.
A lot has changed since then, with a huge paradigm shift in the domain of NLP/NLU, with OpenAI spilling the beans with ChatGPT!

Leaders drop GPT in chats to seem smart, though they don’t know their part

The hype around large language models (LLMs) is real, and everyone is jumping on the bandwagon. But what exactly are LLMs, and how do they work? This article will demystify the hype and explain how LLMs are built on a foundation of traditional NLP techniques.

LLMs are not just a passing fad, but that they are based on well-established principles of NLP.

This article will focus on text embeddings, rather than image or video embeddings. While the core concepts are the same, there are differences in how the embeddings are created and used.

Text embeddings are the foundation of all NLP tasks, because that’s how we convert a sentence into a format a machine understands. Also, It’s not too complex to grasp that text embeddings are probabilistic representations of text. The only distinction among different embedding methods lies in how they are calculated.

The below infographic is a brief overview of the buzz words in the NLP space, and how the text embeddings are created, and how they evolved over time. Each of the algorithms mentioned, have made news when they were introduced & were considered to be ground breaking innovation in the field of NLP.

What’s Driving the Hype Around Chat GPT?

There are 2 major factors among others which are driving the hype towards GPT

Cost of Compute
Since 2000s, the semiconductor industry have claimed to keep pace with Moore’s law, which predicts the doubling of computing power approximately every two years. This consistent increase in computational capabilities per unit cost has played a major role in reducing computing costs. To put this in perspective, the cost of a GPU in 2012 was $1000, compared to $150 in 2023.
Availability of Huge Text corpus to train
Digitisation trends in the last decade saw a significant increase in data collected, copied & consumed. As per the statistica report, the data volume increased from 9 Zeta bytes to 120 Zeta bytes in 2023.

The other note worthy advancements are cloud computing, availability of open-source software and libraries, Distributed computing frameworks(like Hadoop, Spark, etc)

Before we get into details of how text embeddings are probabilistic, lets shift gears and understand probability, in the context of the Indian Public transport system, renowned for its timeliness .

Imagine you are waiting for a bus, and want to predict its arrival time, and you observed the bus schedule over the past week.

Initial Probability — Based on your observations for a week, you find that the bus arrives on time (within a 5-minute window) for about 6 days. This implies a 85% likelihood of bus arriving on time.

Surprised with this, considering the context of Indian Transport System, you decided to dig more data, and manage to get data for the past 100 days.

Out of these 100 days, it is observed that the bus arrived on time, for about 60 days, implying a 60% chance of arrival on right time. And this seems reasonable.

In summary, probability is about predicting if something will happen again based on what happened before. The more information you collect, the better your predictions get, and this helps you make smart decisions with confidence. Now lets get back to Text Embeddings.

In the context of Text, a Language Model is a probability distribution of a sequence of words. The model assigns a probability to the entire sentence by combining the conditional probabilities of its individual tokens.

The probability of a sentence can be defined as the product of the probability of each word given the previous words. Language models generate probabilities by training on text corpora in one or many languages. These language models can be used for text classification, text understanding, or text generation based on how the underlying architectures.

LLM is simply a language model that is trained on Huge Corpus

This LinkedIn Article by Ivan Reznikov outlines how LLMs generate text in great detail. To put things in perspective, BERT, which is also a LLM, is trained on a corpus of 370 Million, and the GPT3 model is trained on a corpus of 170 Billion tokens

The other important piece of innovation to talk about is the Transformer Architecture, which contains two main components: an encoder and a decoder. This is what enables the LLMs to learn complex relationships between words and phrases. It allows them to process and generate long sequences of text efficiently and effectively.

The encoder component uses a self-attention mechanism to learn long-range dependencies in the input sequence. This means that the encoder can learn how words and phrases relate to each other, even if they are far apart in the sequence. This is important for LLMs, which need to be able to understand the context of long sentences and paragraphs in order to generate accurate and informative responses.

The decoder component is responsible for generating the output text, one word or phrase at a time. It combines the self-attention and cross-attention mechanisms to generate the output sequence. The self-attention mechanism allows the decoder to attend to different parts of the input sequence, while the cross-attention mechanism allows the decoder to attend to the output sequence it has generated so far. This allows the decoder to generate output sequences that are coherent and consistent with the input sequence.

Transformer Architecture Image Ref: Medium : Mlearning — AI

Use cases of LLMs & the underlying architectures:

Despite the hype surrounding ChatGPT, most enterprises are still trying to figure out how to use large language models (LLMs) and generative AI to create business impact. The most common use cases , that are low hanging and the primary target for enterprises are question answering on internal knowledge, writing code & comments, content classification, content summarisation, Natural Language processing, language translation, content generation , and the list goes on. However, these use cases fall into one of the below categories.
1. Content Understanding
2. Content Generation
3. Machine Translation / Summarisation
Here, we take a closer look at each of the categories listed above and understand their underlying architectures.

Text Classification/Segregation

Consider a scenario where you have a million reviews for a smartphone, and these reviews talk about a variety of topics, like camera, battery, performance, etc. Now, what would you do, if you want to segregate these reviews without having to read all of them?

LLMs in this scenario play the role of a super-smart assisting by helping you discover the key themes in a sea of reviews without reading each one individually.

For an LLM to be able to do that, it requires to be able to understand the context of every word/sentence. The best way to capture the context of a word, is by looking at the words before and after, as illustrated in the image below.

This Bidirectional architecture helps the BERT embeddings differentiate between the word “bank” in the 2 sentences below:

I went to the river bank
I went to the bank to make a deposit.

This Bidirectional architecture is also referred to as the Auto Encoder transformer, where the model then learns to reconstruct the correct inputs based on the surrounding context, taking into account both the preceding and the following tokens.

Text Generation

Text generation is a game of probability, where it is about generating the word which is most probable to occur next. To be able to predict the next possible word, a unidirectional architecture works just fine. All you need is for the algorithm to capture the probability of next works occurring, based on the previous words. Since the models are trained on huge corpus of data, the probabilities are also more accurate.

Most of the LLM models making the news, GPT, PALM, LLAMA, belong to this category.
This class of models are referred to as Autoregressive models, and employ probabilistic inference to generate text, relying primarily on the decoder component of the transformer. One thing to note about this architecture is, they don’t really have to understand the underlying text to generate it. This is the primary cause of hallucinations. However, the scope of finetuning is driving a lot of attention to these models.

Text Translation

This is the typical task an encoder-decoder architecture was originally created for. Consider a English to French translation task, the encoder takes an input sentence in English and produces a vector representation of that sentence. The decoder takes the vector representation of the English sentence and generates an output sentence in French, and the vice versa could also be true, as depicted in the image below.

This category of models require a sufficient sample of translated samples for validation, before being accurate in translating text in real time.

In summary, sequence -sequence models are powerful for language translation by mapping sequences between languages, autoregressive models helps in text generation & auto encoders excel at language understanding & classification.

Generative AI has revolutionised the way machines learn, understand, and create. Unlike traditional AI systems that are based on rules and predefined datasets, generative AI possesses the incredible ability to generate new content, images, text, music, and even entire scenarios autonomously. It’s like having a machine that can dream and create.

Just a decade ago, you might have to hire a statistician or computer science graduate to build a ML models, but now Gen AI has democratised the usage of ML , and anyone irrespective of their technical background can create models & drive data driven decision making.

2024 is going to be the year to watch out for, as companies begin to see financial returns for their investments. Gen AI has the potential to revolutionise many industries by helping businesses save money and increase revenue!! We are likely to see a paradigm shift in UI/UX designs to incorporate Gen AI capabilities.

Hope you enjoyed reading this article, and Stay tuned to Intuition Matters for building intuition on the latest trends.

About Intuition Matters :
Intuitive understanding can help everything else snap into place. Learning becomes difficult when we emphasize definitions over understanding. The modern definition is the most advanced step of thought, not necessarily the starting point. Intuition Matters in everything, and it matters the most!

Unveiling the Intuition behind LLMs and Gen AI : Demystifying the Hype

What’s Driving the Hype Around Chat GPT?

Text Classification/Segregation

Text Generation

Text Translation

Written by Karthik Vadhri