Streamlining Data Integration: Connecting Azure Open AI APIs Using Azure Databricks!

Teja Bogavalli
3 min readApr 6, 2024

--

In today’s data-driven world, businesses rely on a plethora of external sources for valuable insights and information. Open AI’s GPT APIs serve as gateways to accessing this wealth of data, offering a seamless means of integration.

Azure Databricks, a unified analytics platform built on Apache Spark, provides a robust framework for processing and analyzing large volumes of data.

In this post we explore how to leverage Azure Databricks to connect and utilize the power of open APIs (GPT-4) for streamlined data integration and analysis.

Generated by DALL-E-3

First let’s set up environment variables, for this you need to generate Azure Open API endpoint & API Access Key. They can be obtained from Open AI API website/documentation.

export AZURE_OPENAI_API_KEY="<your API Key>"
export AZURE_OPENAI_ENDPOINT="<your_Open AI_Endpoint>"

This script sets environment variables to store the API key and endpoint URL respectively, which will likely be used for authentication and endpoint configuration when interacting with Azure Open AI services in subsequent commands or scripts.

Let’s look at a single prompt to response code:

Import below libraries, if not available already , then do a quick pip install

import os
from openai import AzureOpenAI

Now let’s create an instance of the ‘Azure Open AI’ class, initializing it with parameters such as the API key, API version, and Azure endpoint.

client = AzureOpenAI(
api_key = os.getenv("<your_Azure_open_API_Key>"),
api_version = "<your_version>",
azure_endpoint = os.getenv("<your_Azure_open_API_Endpoint>")
)

We can send a request to the Azure Open AI service to generate a response to a user query using the provided chat model as below:

response = client.chat.completions.create(
model="<your_model_name>", # model = "deployment_name".
messages=[

{"role" : "user", "content" : "What is Generative AI" }
],
max_tokens=100
)
print(f"Prompt: {prompt}")
print("Response:", response.choices[0].message.content)
print()

We can restrict output tokens to desired length. Adjusting the max_tokens parameter can influence the length and detail of the generated response.

Code generates the response to the prompt “What is Generative AI”

Response: Generative AI is a type of artificial intelligence that involves creating new data that imitates or simulates the data it was trained on. The technology enables AI systems to create content such as images, text, and audio based on patterns and features it learned from the input data. The goal of generative AI is to produce outputs that are as close as possible to the real-life examples it was trained on. Examples of generative AI include deepfake technology, auto-completion of sentences, chatbots

Let’s look at list of prompts to response code:

We can do the same for more than one prompt by iterating over prompts.

# List of prompts
prompts = [
"What is Generative AI?",
"How does machine learning work?",
"Explain the concept of neural networks."
]

# Iterate over prompts
for prompt in prompts:
response = client.chat.completions.create(
model="<your_model_name>",
messages=[
{"role": "user", "content": prompt}
],
max_tokens=100
)

# Print response for each prompt
print(f"Prompt: {prompt}")
print("Response:", response.choices[0].message.content)
print()

In this code:

  1. The list prompts contains multiple prompts you want to send to the chat model.
  2. The for loop iterates over each prompt in the list.
  3. Inside the loop, the client.chat.completions.create() method is called for each prompt, and the response is stored in the response variable.
  4. .The response for each prompt is then printed to the console, including the prompt itself and the generated response.

Finally iterating over a pandas data frame :

# Assuming df is your DataFrame containing prompts under a column named 'Prompt'
# Replace 'df' with the name of your DataFrame
df = pd.read_csv('your_dataframe.csv')

# Extract prompts from DataFrame
prompts = df['Prompt'].tolist()

# Iterate over prompts
for prompt in prompts:
response = client.chat.completions.create(
model="<your_model_name>",
messages=[
{"role": "user", "content": prompt}
],
max_tokens=100
)

# Print response for each prompt
print(f"Prompt: {prompt}")
print("Response:", response.choices[0].message.content)
print()

Overall, these code snippets automates the process of generating responses to multiple prompts using the Azure Open AI service, enabling efficient interaction with the model and extraction of insights from the provided prompts.

--

--

Teja Bogavalli

I am currently leading a team of data scientists at one of India's largest NBFCs. With extensive experience in data science, machine learning, and analytics.