An introduction to developing with ChatGPT and GPT4 on Azure

Published in

SDG Group

8 min readApr 10, 2023

At this point, you are probably well aware of what ChatGPT and GPT-4 are. They have generated countless headlines, motivated a call for a moratorium on AI experiments, and created very high expectations. Whether you think they are a first step towards AGI, an existential threat, or just some overhyped language models, what’s clear is that integrating these models into your solutions can be an invaluable tool to solve complex problems.

Although OpenAI has been offering an API to access their models for some time now, it was not a real option for enterprise applications due to the absence of SLAs or compliance guarantees (see for example the Italian government concerns). This, however, changed when Microsoft announced the general availability of the OpenAI models as an Azure offering.

In this post we will go through all the necessary steps to leverage the power of ChatGPT and GPT-4 in an Azure environment:

Requesting access to the Azure OpenAI service.
Deploying models.
Setting up the OpenAI Python library.
Using the Completion and ChatCompletion APIs.
Further learning resources.

Getting started

When you log in to your Azure portal, you can find Azure OpenAI just like any other service. However, before being able to deploy it, you’ll need to request access.

Creating an Azure OpenAI resource (image by author)

Requesting access to Azure OpenAI service

As you can see in the image, you will be prompted to request access before being able to deploy the service. When you click on the link, you’ll be redirected to a form to file your request. There you’ll be asked to enter your corporate email (don’t even bother with a Gmail account), provide some basic information about your company, enter your subscription ID, select which services you want to request access to (text and code models, and/or text-to-image DALL·E model) and, finally, select the use cases you want to use Azure OpenAI for. This is something that surprised me since I was expecting some kind of free-text field to describe your intended use case. In the case of the text and code models, these are the available options:

Chat and conversation interaction.
Chat and conversation creation.
Code generation or transformation scenarios.
Journalistic content.
Most Valuable Professional (MVP) or Regional Director (RD) Demo Use.
Question-answering.
Reason over structured and unstructured data.
Search.
Summarization.
Writing assistance on specific topics.

You can select as many as you want, but I’d recommend selecting just the one or two use cases you are most interested in. You can always request additional use cases at a later point.

Once you send your request, you should receive an answer from Microsoft in about 7 days. In my case, the request was initially rejected and then approved a few days later. I expect the entry barriers to lower over time and eventually disappear.

Azure OpenAI services

Text-to-image models (DALL·E 2): Models that can generate images from a natural language text description.
Text generation models: Language models that can perform a variety of natural language understanding and generation tasks, like text generation, question answering, or sentiment classification. ChatGPT, GPT-4, and the like belong to this category.
Code generation models (Codex): Language models specifically designed to generate computer code. (NOTE: Recently, OpenAI announced that the support for these models will be discontinued since the latest text generation models have equal or better code generation abilities)
Embedding models: Language models that, given a text, compute an embedding: a mathematical representation of that text. By comparing these embeddings, it is possible to measure how similar texts are. This can be used for semantic search or cluster analysis. You can find more information here.

You can read about all these models in the official documentation.

Deploying our model

As we said above, these models can be deployed just like any other Azure service. You can find a detailed guide here. First, you create the cognitive service by selecting a subscription, resource group, and region, and choosing a name for your resource. Then, you can start deploying models. As usual, each one will have an associated endpoint and pair of keys.

Deploying an OpenAI model (image by author)

Note that not all models and features are supported in all regions. See the compatibility matrix here.

Developing with ChatGPT and GPT-4

Now we have our model of choice deployed, it’s time to start developing with it. First, we’ll see how to set up the OpenAI library and then how to use its Completion and ChatCompletion APIs. We’ll be using Python throughout this tutorial.

Setting up the OpenAI Python library

The first step is to install the OpenAI library (to access the later functionalities in Azure you’ll need version 0.27.0 or above):

pip install openai==0.27.0

Then, we need to go to our Azure deployment and take note of our endpoint, API key, and the name of our deployment. Assuming that we have all these values stored in environment variables, we can initialize the library the following way:

import os  
import openai

openai.api_key = os.getenv('OPENAI_API_KEY')  
openai.api_base = os.getenv('OPENAI_ENDPOINT')  
openai.api_type = 'azure'  
openai.api_version = '2023-03-15-preview'

DEPLOYMENT_NAME = os.getenv('OPENAI_DEPLOYMENT')

Here we are:

Telling the openai library that we want to use its ‘azure’ flavor. (It differs mainly in how authentication is performed and that we’ll need to specify an endpoint).
Specifying that, in particular, we want to use the ‘2023–03–15-preview’ version of the API. This is needed because Azure OpenAI is quite new and the API is still subject to frequent changes. I hope it will be no longer needed once things settle down.
Providing our API key and endpoint.
Storing the name of our deployment in the DEPLOYMENT_NAME variable. We will need it later.

Using the Completion and ChatCompletion APIs

There are two main ways to interact with the OpenAI models:

Completions: Used with those models that receive a single input and generate an output, like Codex or the GPT-3 family of models.
ChatCompletions: Used with the models optimized for chat (multi-turn interactions), like gpt-35-turbo -the model behind ChatGPT- or gpt-4.

Let’s see an example of each:

Completions:

prompt = """You are a virtual assistant for geography questions.
You can answer any geography question always providing accurate information
Question: What is the capital of Ecuador?"""

response = openai.Completion.create(
           engine=DEPLOYMENT_NAME,
           prompt=prompt,
           max_tokens=200,
           )

Here, we see that we are providing a prompt (an initial text our model will generate a continuation for), the name of our Azure OpenAI deployment, and max_tokens, the maximum length of the generated response. There are some other parameters that you can find in the API reference.

The output should be something like this:

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "text": "\n\nAnswer: The capital of Ecuador is Quito."
    }
  ],
  "created": 1680513952,
  "id": "cmpl-71AjgNTKaGDNTBBVaBzPW1QiXHwWY",
  "model": "text-davinci-003",
  "object": "text_completion",
  "usage": {
    "completion_tokens": 12,
    "prompt_tokens": 30,
    "total_tokens": 42
  }
}

Using the Completion API with the chat models

You can use the Completions API with chat models by encoding the different conversation turns into a single prompt using a notation called ChatML:


prompt = """<|im_start|>system
You are a virtual assistant for geography questions.
You can answer any geography question always providing accurate information<|im_end|>
<|im_start|>user
What is the capital of Ecuador?<|im_end|>
<|im_start|>assistant
Quito is the capital of Ecuador.<|im_end|>
<|im_start|>user
What is its population?<|im_end|>
<|im_start|>asssitant
"""

response = openai.Completion.create(
  engine=DEPLOYMENT_NAME,
  prompt=prompt,
  max_tokens=200,
  stop=["<|im_end|>"]
)

As you can see, we are encoding the different conversation turns using the special tokens <|im_start|> and <|im_end|>. Although, apparently, this is how gpt-35-turbo and gpt-4 work under the hood, it is not the recommended way to interact with the chat models. The ChatCompletion API should be used instead. First, it provides an easier and more convenient interface. Secondly, the ChatML language is still experimental and subject to changes. Just to give you an example, while I was working with the Completion API a few weeks ago I suddenly started finding the token <|im_sep|>, which was absolutely undocumented, in the responses. The purpose of the ChatCompletion API is precisely to abstract the user from these particularities and behavior changes.

ChatCompletion

In the ChatCompletion API, instead of encoding all the conversation turns in a single prompt, we provide the model with a list of interactions:

messages = [{"role": "system", "content": """You are a virtual assistant for geography questions.
You can answer any geography question always providing accurate information"""},
{"role": "user", "content":"What is the capital of Ecuador?"},
{"role": "assistant", "content":"Quito is the capital of Ecuador."},
{"role": "user", "content":"What is its population?"}]

response = openai.ChatCompletion.create(engine=DEPLOYMENT_NAME,
                                        messages=messages)

which gives us:

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "According to the latest estimates, the population of Quito, Ecuador's capital, is approximately 2.7 million people.",
        "role": "assistant"
      }
    }
  ],
  "created": 1680513679,
  "id": "chatcmpl-71AfHTKVhZxqTxrgiahjzLT9rujKe",
  "model": "gpt-35-turbo",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 27,
    "prompt_tokens": 61,
    "total_tokens": 88
  }
}

You probably noticed that, apart from the user and assistant conversation turns, there is a special system message. That’s where you can provide context and instructions, or modulate the tone of the assistant. According to the OpenAI documentation, gpt-35-turbo was not explicitly optimized to follow this system message, so it can sometimes overlook it or not pay enough attention (although, in my experience, it tends to work pretty well). gpt-4, on the other hand, has been optimized to always follow the instructions in this initial message.

A note about pricing

All the pricing details can be found here. If you have used the OpenAI API before, you’ll see that prices are exactly the same. In the case of ChatGPT (gpt-35-turbo) we have a cost of 0.002 $/1000 tokens (an order of magnitude less than the regular GPT-3 models!), while for GPT-4 the cost is 0.03 $/1000 tokens for the prompt and 0.06 $/1000 tokens for the completion.

With this pricing, you should be really sure that you need it before choosing GPT-4, since gpt-35-turbo is much cheaper and equally capable for most use cases.

Where to go from here

That’s all for today. I hope you found this post useful. However, we just scratched the surface and there is still a lot to cover that we had to leave out. Here are some ideas to try and go a bit deeper:

The memory of these models is finite. To handle complex prompts and conversations, it is important to know how to measure how much of this memory we are consuming at any given point.
The term prompt engineering has been coined to refer to the different techniques and approaches to achieve the right prompt to make the model behave just the way you want. You can learn the basic techniques or explore ready-to-use prompt marketplaces.
You can feed the models with your own documents and texts to make custom Q/A systems or assistants.
You can study the code samples from OpenAI and Azure.
You can take a look at libraries like LangChain, built to make interaction with language models easier.