Generative AI, OpenAI, and ChatGPT: What are they?

Francesca Lazzeri
Data Science at Microsoft
10 min readMay 30, 2023

by Francesca Lazzeri, Ph.D.

In the years since its wide deployment, Artificial Intelligence (AI) has demonstrated impact in several industries, with accomplishments such as accurate medical imaging analysis, high-resolution weather forecasts, and precise drug discovery. A 2022 McKinsey survey shows that AI adoption has more than doubled over the past five years, and investment in AI is increasing rapidly. Moreover, generative AI tools like ChatGPT have the potential to change how a range of jobs across multiple industries are performed. While the full scope of that impact and the potential risks are still unknown, there are some questions we can answer with clarity and some concepts that we can learn now to better understand this fascinating field of applied AI and leverage it in a successful and responsible way, such as:

  1. What are the generative AI models, and what are their histories?
  2. What are OpenAI and Azure OpenAI?
  3. What is ChatGPT and what kinds of problems is ChatGPT best suited to solve?
Figure 1: Generative AI, Open AI, and AI services.

1. What are the generative AI models, and what are their histories?

Generative AI models are a subset of Deep Learning models that can produce new content based on what is provided and described in the input. It is interesting to look back at the history of generative models to see how they have been invented by mathematicians, statisticians, and computer scientists over time. Here is a summary of a few key milestones (Figure 2):

Figure 2: Generative AI history and key milestones.
  • 1948: Scientist Claude Shannon publishes A Mathematical Theory of Communications, in which he categorizes communication in terms of five basic components: a source, a transmitter, a channel, a receiver, and a destination. The Shannon–Weaver model is one of the first and most influential models of communication that has been used to structure and reproduce artificial language models.
  • 1950: Scientist Alan Turing publishes Computing Machinery and Intelligence, which is the first paper to introduce the Turing test for explaining and answering the question of whether machines can think and speak.
  • 1964–1966: ELIZA, an early natural language processing computer program, is created at MIT by Joseph Weizenbaum. ELIZA can simulate a conversation with a human by using a simple algorithm to generate text responses to questions.
  • 1966: Scientists Leonard Baum and Ted Petrie introduce the Hidden Markov model (HMM), which uses a Markov process that contains hidden and unknown parameters. The HMMs encode an internal representation of the grammatical structure of sentences (nouns, verbs, and so on), and they use that knowledge when predicting new words.
  • 1982: Scientist John Hopfield introduces the Hopfield network, a recurrent neural network (RNN) that can learn and remember patterns. These networks provide a model for understanding human memory and language. RNNs become quite popular because they are able to accept a much larger number of input tokens (i.e., text fragments with some metadata for the model to use).
  • 1997: Scientists Sepp Hochreiter and Jürgen Schmidhuber introduce the concept of long short-term memory (LSTM), which utilizes RNN models. Unlike standard feedforward neural networks, LSTM has feedback connections that allow them to process not only single data points (such as images), but also entire sequences of data (such as text, speech, and video data).
  • 2004: Scientists Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin write A neural probabilistic language model, in which they introduce an early language modeling architecture. This new approach involves a feedforward architecture that takes in input vector representations (i.e., word embeddings) of the previous words in the sentence.
  • 2014: Kyunghyun Cho et al. write Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation, in which they introduce the Gated Recurrent Unit (GRU) concept, a gating mechanism in RNNs similar to a LSTM unit but without an output gate. GRUs try to solve the vanishing gradient problem that can come with standard recurrent neural networks, making language models faster and more accurate.
  • 2014: Scientists Ian Goodfellow et al. write Generative Adversarial Networks and introduce a new class of Machine Learning frameworks that can generate new data based on a given training set.
  • 2017: Google writes Attention Is All You Need and introduces the concept of Transformers, which is a new method that allows a significant increase in the number of input tokens, eliminates the gradient instability issues seen in RNNs, and is highly parallelizable, meaning that it is able to take advantage of the power of GPUs. Transformers are based on the “attention mechanism,” where the model can pay more attention to some inputs than others, regardless of where they show up in the input sequence.
  • 2018: OpenAI writes Improving Language Understanding by Generative Pre-Training (GPT), which introduces the concept of GPT for the first time and shows how a generative language model can acquire knowledge and process dependencies unsupervised based on pre-training on a large and diverse set of data.
  • 2019: OpenAI writes Language Models are Unsupervised Multitask Learners and releases the complete version of its GPT-2 language model, which translates text, answers questions, summarizes passages, and generates text output. The ability of GPT-2 to perform these tasks is an extension of its general ability to accurately synthesize the next item in an arbitrary sequence.
  • 2020: OpenAI releases GPT-3, which is an extension and updated version of GPT-2. The main difference between GPT-2 and GPT-3 is the number of parameters used in the model. GPT-2 has 1.5 billion parameters while GPT-3 has 175 billion parameters.
  • 2022: OpenAI releases ChatGPT, a tool that uses GPT models to generate human-like text. It is a product of OpenAI that allows users to generate text in natural language. ChatGPT is a sibling model to InstructGPT, which is trained to follow an instruction in a prompt and provide a detailed response. ChatGPT is trained using Reinforcement Learning from Human Feedback (RLHF), using the same methods as InstructGPT, but with slight differences in the data collection setup.
  • 2023: OpenAI releases GPT-4 with a paid subscription allowing users access to the Chat GPT-4 tool. The biggest difference between GPT-3 and GPT-4 is shown in the number of parameters it has been trained with. GPT-3 has been trained with 175 billion parameters, and GPT-4 is likely to be trained with 100 trillion parameters. GPT-4 has more data than GPT-3, which is thought to significantly improve its performance.

2. What are OpenAI and Azure OpenAI?

OpenAI is a company founded in 2015 and focused on AI research and development. The OpenAI models are a collection of generative AI models that can produce language, code, and images. Microsoft has partnered with OpenAI to deliver on three main goals:

  • To utilize Azure infrastructure, including security, compliance, and regional availability, to help users build enterprise-grade AI applications.
  • To deploy OpenAI AI model capabilities across Microsoft products, including and beyond Azure AI products.
  • To use Azure to power all OpenAI workloads.

Azure OpenAI Service is a new Azure Cognitive Service that provides REST API access to OpenAI’s powerful language models. There are three main categories of capabilities found in OpenAI AI models:

Table 1: Capabilities and examples offered by Azure OpenAI Service.

When working with Azure OpenAI Service, getting a firm grasp of the key concepts is really the key to getting the most out of the Azure OpenAI Service API calls. The key concepts are:

Table 2: Azure OpenAI Service key concepts.

Today, one of the most well-known examples of Generative AI is GPT-3. GPT-3 is a language generation model developed by OpenAI that can generate human-like text. It has been used to create chatbots, content for social media, and even short stories.

Another popular example of Generative AI is Codex, which is the model that powers GitHub Copilot. Codex can now interpret simple commands in natural language and execute them on the user’s behalf, making it possible to build a natural language interface to existing applications. Codex is a descendant of GPT-3, and its training data contains both natural language and billions of lines of source code from publicly available sources, including code in public GitHub repositories. It is most capable in Python, but it is also proficient in over a dozen languages including JavaScript, Go, Perl, PHP, Ruby, Swift and TypeScript, and even Shell.

Lastly, we have DALL·E, which is an AI system that can create realistic images and art from a description in natural language. It was developed by OpenAI and introduced in January 2021. DALL·E is a simple decoder-only transformer that receives both the text and the image as a single stream of 1280 tokens (256 for the text and 1024 for the image) and models all of them autoregressively. The attention mask at each of its 64 self-attention layers allows each image token to attend to all text tokens.

Figure 3: Generative AI systems offered by OpenAI.

3. What is ChatGPT and what kinds of problems is ChatGPT best suited to solve?

ChatGPT is an artificial intelligence (AI) chatbot developed by OpenAI and released in November 2022. It is built on top of OpenAI’s GPT-3 and GPT-4 foundational large language models (LLMs) and has been fine-tuned using both supervised and reinforcement learning techniques. ChatGPT is a member of the generative pre-trained transformer (GPT) family of language models.

GPT-3 consists of a series of models that can understand and generate natural language. These models are a completion-style model, which means that if we give them a few words as input, they can generate a few more words that are likely to follow them in the training data. ChatGPT, on the other hand, is a conversation-style model, which means that it performs best when we communicate with it as if we’re having a conversation. It’s based on the same transformer base model as GPT-3, but it’s fine-tuned with conversation data. Then it’s further fine-tuned using Reinforcement Learning with Human Feedback (RLHF), which is a technique that OpenAI introduced in their 2022 InstructGPT paper. Previous models were text-in and text-out, meaning they accepted a prompt string and returned a completion to append to the prompt. However, the ChatGPT model is conversation-in and message-out. The model expects a prompt string formatted in a specific chat-like transcript format, and then it returns a completion that represents a model-written message in the chat.

In this technique, the model is given the same input twice, provides two different outputs, and then a human ranker is asked which output is preferable. That choice is then fed back into the model through fine-tuning. This technique brings alignment between the outputs of the model and human expectations and is critical to the success of OpenAI’s latest models. GPT-4 on the other hand, can be used both for completion and conversation, and has its own entirely new base model. This base model is also fine-tuned with RLHF for better alignment with human expectations.

The main differences between GPT models and ChatGPT can be summarized as follows:

Table 3: Main differences between GPT models and ChatGPT.

To use ChatGPT and GPT-4 with Azure OpenAI Service you must complete the following steps:

  • An Azure subscription: Create one for free.
  • Access granted to Azure OpenAI in the desired Azure subscription. Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access.
  • An Azure OpenAI Service resource with either the gpt-35-turbo (preview), or the gpt-4 (preview) models deployed. For more information about model deployment, see the resource deployment guide.

Navigate to Azure OpenAI Studio at https://oai.azure.com and sign-in with credentials that have access to your OpenAI resource. During or after the sign-in workflow, select the appropriate directory, Azure subscription, and Azure OpenAI resource.

From the Azure OpenAI Studio landing page, select ChatGPT playground (Preview):

Figure 4: How to select the ChatGPT playground from the Azure OpenAI Studio landing page.

After this step, you can start exploring OpenAI capabilities with a no-code approach through the Azure OpenAI Studio ChatGPT playground. From this page, you can quickly iterate and experiment with various capabilities:

Figure 5: OpenAI capabilities with a no-code approach through the Azure OpenAI Studio.

I hope this article has helped you gain a better understanding of the various generative AI models and their histories, the difference between OpenAI and Azure OpenAI, what is ChatGPT and the kinds of problems it is best suited to solve, and how to get started working with these tools in this revolutionary new computing era.

Francesca Lazzeri, Ph.D., is on LinkedIn.

References and resources

--

--

Data Science at Microsoft
Data Science at Microsoft

Published in Data Science at Microsoft

Lessons learned in the practice of data science at Microsoft.

Francesca Lazzeri
Francesca Lazzeri

Written by Francesca Lazzeri

Principal Data Scientist Director @Microsoft ~ Adjunct Professor @Columbia University ~ PhD

Responses (2)