Revolutionizing Natural Language Understanding: GPT-3’s In-Context Learning

8 min readMar 9, 2023

The Key to Practical Applications of GPT-3 for Organizations

Street sculptures in Malmo Sweden, photo by the Author

The GPT-3 Revolution

With the popularity of ChatGPT, jargons like NLP/NLU, Generative AI, LLM(Large Langauge Model), Non/Few-Shot Learning etc. become common words to many. In fact, none of those things is new. However, what really makes it remarkable is ChatGPT’s ability to understand and generate responses that are both informative and easy to understand. It can also adapt responses to different levels of complexity and provide insightful explanations and examples to help clarify difficult concepts.

The language model behind ChatGPT, GPT-3 (Generative Pre-trained Transformer 3) is a highly advanced natural language processing model that sets itself apart from other NLP models in several ways:

Scale: GPT-3 is one of the largest NLP models in existence, with 175 billion parameters. This large scale enables it to model complex patterns in language and generate highly coherent responses that fit within a given context.
Generative Pre-training: GPT-3 is pre-trained on a massive corpus of text data using a generative pre-training approach. This means that the model is trained to generate language as opposed to just predicting it. This approach allows the model to capture the nuances and intricacies of language and to generate highly fluent responses.
Few-shot Learning: GPT-3 is highly effective at few-shot learning, which means it can learn and generate responses from just a few examples. This is particularly useful for tasks where labeled data is scarce or expensive to obtain.
In-context Learning: GPT-3 is highly proficient at in-context learning, which means it can generate responses that fit within a given context or situation. This is achieved through its ability to understand the context of a given text and generate responses that are relevant to that context.
Versatility: GPT-3 is a highly versatile model that can be adapted to a wide range of NLP tasks. It can be fine-tuned on specific tasks or domains, which enhances its ability to learn and generate responses that are tailored to the specific task.

To me, in-context learning is the most increadible feature that brings revolution to NLP, and changes the way how we process data fundamentally.

This shouldn’t be a surprise, as unstructured data comprises the vast majority of data found in an organization, some estimates run as high as 80%. [1]

[1] Shilakes, Christopher C.; Tylman, Julie (16 Nov 1998). “Enterprise Information Portals” (PDF). Merrill Lynch.

What Is In-Context Learning?

In-context learning refers to the model’s ability to generate highly coherent responses that fit within a given context or situation. In other words, it is the ability of the model to understand the context of a given text and to generate responses that are relevant to that context.

For example, if a user asks GPT-3 “What’s the weather like today?”, the model can generate a response that takes into account the user’s location and the current date and time to provide an accurate weather report. Similarly, if a user asks a question about a specific topic, GPT-3 can generate a response that draws on its vast knowledge of the topic to provide a highly relevant and informative answer.

Such a capability opens up many new technical possibilities that were previously considered unique to humans.

For example, with in-context learning, systems can be developed to do tasks such as expanding emails, extracting entities from text, and generating code based on natural language instructions. All of this can be done with just a few demonstration examples, which means that the NLP system can learn and generalize to new cases on its own without requiring further fine-tuning.

In-Context Learning: Magic or Not?

GPT-3’s in-context learning capability is a result of its pre-training on a massive corpus of text data and the use of advanced neural network architectures, specifically Transformers. The model is pre-trained on a wide range of tasks, such as predicting the next word in a sentence, language translation, and summarization, which allows it to capture the nuances and intricacies of language.

Transformers are neural network architectures that excel at modeling long-range dependencies in sequences of data, such as text. They use attention mechanisms to identify important parts of the text and can weigh the importance of each part when generating a response. Here is a series of posts that explained Transformer architecture in an easy-to-follow way:

Transformers Explained Visually (Part 1): Overview of Functionality

A Gentle Guide to Transformers for NLP, and why they are better than RNNs, in Plain English. How Attention helps…

towardsdatascience.com

Transformers Explained Visually (Part 2): How it works, step-by-step

A Gentle Guide to the Transformer under the hood, and its end-to-end operation.

towardsdatascience.com

In-context learning is a significant breakthrough because it means that machines can now understand and work with language in a more human-like way. This has many implications for industries such as customer service, healthcare, and education, where NLP systems can be used to automate repetitive tasks, provide better customer experiences, and improve the overall efficiency of operations. Additionally, it has also opened up new avenues for research in fields such as linguistics, psychology, and artificial intelligence.

However, magic doesn’t come without any cost. The real challenge relates to the construction of context for the model. In the original paper[2], task-relevant examples are randomly sampled from the training set to construct the context. However, in practice, other tests observed that the performance of GPT-3 tends to fluctuate with different choices of in-context examples from the same dataset.

Source: Liu etc. (2022) *What Makes Good In-Context Examples for GPT-3?*

This means that the model’s performance can be highly sensitive to the specific examples that are used to provide context. This variance can be a significant challenge for practical applications in enterprises and governments, where consistency and reliability are essential.

[2] T. Brown (2020) Language Models are Few-Shot Learners

Strategies to Select Proper Contexts

To decide the proper contexts to select, there are several existing approaches.

Keyword-based Strategy: This strategy involves selecting a context that includes specific keywords that are relevant to the task or problem you are trying to solve. For example, if you are trying to generate responses related to the topic of “artificial intelligence”, you could select a context that includes keywords such as “machine learning”, “neural networks”, and “deep learning”.
Topic Modeling Strategy: This strategy involves using topic modeling techniques to identify the most relevant topics and themes in a given text corpus. You can then use these topics and themes to select a context that is most relevant to the task or problem you are trying to solve.
Domain-specific Strategy: This strategy involves selecting a context that is specific to a particular domain or industry. For example, if you are trying to generate responses related to the healthcare industry, you could select a context that includes information about medical terminology, procedures, and treatments.
User-generated Strategy: This strategy involves collecting data and feedback from users to identify the most relevant context for a given task or problem. You can use surveys, focus groups, or other methods to gather feedback and input from users.
Embedding Strategy: This approach involves representing the text corpus as a set of vectors in a high-dimensional space, where each vector represents a specific word or concept. Word embeddings identifies semantic relationships between words, such as synonyms or antonyms, which makes it possbile to look for the most relevant context based on the similarity of the vectors.
Hybrid Strategy: This strategy involves combining multiple strategies to identify the most relevant context for a given task or problem. For example, you could use a keyword-based strategy to identify relevant keywords and then use a topic modeling strategy to identify the most relevant topics and themes based on those keywords.

In the real-world data ecosystem of an enterprise or government, knowledge graph can provide perfect support to all of the strategies above.

A Retrieval-based Context Selection Approach

To address the sensitivity issue of GPT-3’s few-shot capabilities during the selection of in-context examples, an additional retrieval module can be introduced. This module finds semantically-similar examples of a test instance to construct its context, which can improve the model’s performance in few-shot learning tasks.

In my previous blog post on Improving GPT-3 Q&A Experiences with In-context Learning over Knowledge Graph, I demonstrated a simple project which prepares context based on the entity mentioned in the original question.

Improving GPT-3 Q&A Experiences with In-context Learning over Knowledge Graph

When In-Context Learning Meets Knowledge Graph

medium.com

The results of the project show that this retrieval-based approach produced greatly satisfiable results. This improvement in performance is likely due to the retrieval module’s ability to find more relevant and semantically-similar examples, which helps the model better understand and generate responses that fit within a given context.

In order to further enhance the effectiveness, the retrieval model can be fine-tuned on task-related datasets, which can lead to even stronger empirical results. This is because the fine-tuned retrieval model is better equipped to identify in-context examples that are specifically relevant to the task at hand (ie. in-context learning).

The Retrieval-based Text Generation approach isn’t something new. In many researches, it involves retrieving (by a Retriever) text samples as examples and then editing (by an Editor) them to generate (by a Generator) new text. However, this approach requires the decoder network to be trained from scratch, which can be task- and data-specific.

A Retrieval-based Text Generation Approach[3]

[3] Source: Joint Retrieval and Generation Training for Grounded Text Generation

On the other hand, GPT-3 can be seen as a universal editor that can adapt to a wide range of tasks without the need for fine-tuning. The more semantically similar context we provide to GPT-3, the better results the model can generate.

Summary

Overall, GPT-3’s ability to adapt to a wide range of tasks and generate responses that fit within a given context makes it a powerful tool for natural language processing tasks. By using GPT-3, we can generate high-quality text without the need for fine-tuning, which saves time and resources.

Considering the cost and life cycle of training the fine-tuned GPT-3 model, in-context learning represents an important step towards improving the practical applications of GPT-3 in natural language processing tasks. By fine-tuning the retrieval model on relevant datasets, GPT-3 can achieve even stronger performance in tasks such as sentiment analysis, table-to-text generation, and open-domain question answering.