Unleashing the Power of Generative AI with Databricks SQL

Databricks SQL SME
DBSQL SME Engineering
8 min readFeb 22, 2024
AI-centric warehouse on DBSQL

Author: Yatish Anand, Solutions Architect at Databricks

Introduction

With the advancement of Generative AI , Large Language Models (LLM) have become extremely popular. According to a McKinsey, generative AI could add $2.6 trillion to $4.4 trillion annually in value to the global economy. Thanks to different LLMs like Open AI’s chatbot ChatGPT, Meta AI’s LLama, Mosaic’s MPT, and more, almost every organization across different domains (Finance/Retail /HealthCare, Energy) are using generative AI to power innovation across use case like Customer Service Chatbots, Fraud Prevention, Coding, Information Summarization, etc. As organizations aim to harness the power of generative AI , the need to enable SQL Analysts to use generative AI in their day to day work is apparent.

Enter AI Functions — AI functions are built-in DB SQL functions allowing you to access Large language Models (LLM) directly from SQL. Previously, data scientists used Databricks Model Serving to deploy and manage all AI models: e.g. Foundational models (Llama, MPT, Mixtral) or External Model (Open AI, AWS Bedrock, Anthropic, AI21 etc.) .

Now with the introduction of AI functions , analysts or business users who do not have an ML background can harness the power of these world-class LLMs within the normal flow of writing and running SQL queries and also share results in BI tools. ETL analysts can also call LLMs in data pipelines to enrich the datasets.

In this blog we are going to assume the role of an ETL Analyst in a Ecommerce company who is tasked with automating the process of analyzing the customer reviews and creating a response draft that can be used by the CRM team. In the end our workflow will look like below:

End to End AI Functions Pipeline on DBSQL

Let’s get started!

Setup Overview

  1. Foundation Model Provisioned in Databricks Model Serving: Databricks Model Serving now supports Foundation Model APIs which allow you to access and query state-of-the-art open models from a serving endpoint. For this blog we will use the provisioned throughput mode and deploy a “Mixtral-8x7b” model . The Model endpoint is called “ya-MIXTRAL8x7b-provthro”
  2. Ai_Query Function — The ai_query() function allows you to serve your machine learning models and large language models using Databricks Model Serving and query them. This function invokes an existing Databricks Model Serving endpoint and parses and returns its response. You can also use ai_query() with Foundation Model APIs and external models: Syntax : ai_query(endpointName,modelInput,returnType)

Step 1 — Analyzing Customer Data in Raw Zone

In a typical production pipeline the raw zone is a replica of the source system. In our example let’s assume we have two main driving tables: Customer table and Customer Review table. For our scenario, we have generated a fake customer review dataset by using Databricks AI Playground using below prompt against Mixtral8x-7b Foundational Model :

Prompt to Generate Synthetic data set

Below are the details of our two tables generated with fake data and these two tables are stored in Unity Catalog:

  1. Customer table — which stores the customer information and also how many orders a customer has placed. A sample dataset is shown below :
Fake Customer Table Example Set

2. Customer Review Table — Customer reviews are collected across multiple channels in a myriad of formats and reviews are stored as unstructured raw text. Our raw sample dataset containing the customer id and their respective review are shown below:

Fake Customer Review Table Set

Step 2 — Silver Table — analyzing customer reviews and performing sentiment analysis and entity recognition

Step 2.a. Deploy Mixtral-8x7b endpoint:

The first step is to deploy the Mixtral-8x7b endpoint. As discussed earlier Databricks Model Serving now supports deploying Foundational Models like Mixtral as either Pay per token or Provisioned Throughput. For production workloads that require performance guarantees or if you want to train models on your domain data, we recommend using Provisioned throughput mode.

In order to deploy the Mixtral-8x7b endpoint please follow this link . Once your model is successfully deployed it should look like below:

Serving a founation model in Databricks Model Serving

Step 2.b. Use AI_Query Function: Detecting Customer Sentiment Analysis and Entity Recognition:

We can now use Ai_query function via DBSQL to make inference against our deployed endpoint. Let’s explore how we can use AI_Query to perform Sentiment Analysis and Entity Recognition. In the below example we are calling the endpoint deployed in Step 1 and passing a prompt and analyzing the customer review. LLM generated output should have the JSON output showcasing CustomerSentiment, Product Name, FollowUp Required (in case of a Negative Review) and Reason for Follow Up. Below is the sample call to the LLM using AI_query :

select FROM_JSON(ai_query(
'ya-MIXTRAL8x7b-provthro', "A customer left a review on a product. We want to follow up with anyone who appears unhappy.
Extract all entities mentioned. For each entity:
- classify sentiment as [POSITIVE,NEUTRAL,NEGATIVE]
- whether customer requires a follow-up: Y or N
- reason for requiring followup

Return JSON ONLY. No other text outside the JSON. JSON format:

product_name: <product name>,
category: <product category>,
sentiment: <review sentiment, one of [POSITIVE,NEUTRAL,NEGATIVE]>,
followup: <Y or N for follow up>,
followup_reason: <reason for followup>

### Instruction:"||"I was really disappointed with the quality of the Kraft Macaroni & Cheese I purchased.
The cheese sauce was clumpy and lacked flavor. I would not recommend this product to anyone. "||
"### Response:"),"ARRAY<STRUCT<product_name: STRING, sentiment: STRING, followup: STRING, followup_reason: STRING>>")

The below shows the sample output of the above query:

AI Query Result from Engineered Prompt

Step 2.c. Wrap Ai_Query in wrapper function:

Having to specify all our parameters can be hard to use, especially for a Data Analyst who should focus on crafting proper prompt and not secret management. Hence in the next step we will follow a best practice to make the function call dynamic by putting it inside a wrapper function.

To simplify the process we will create a wrapper function and register is in Unity Catalog called “Test_Silver_mistral” which takes “review” strings as input parameters as shown below :

Create OR REPLACE function hls_yatish.open_ai_sql.Test_Silver_mistral(review STRING)
RETURNS ARRAY<STRUCT<product_name: STRING, sentiment: STRING, followup: STRING, followup_reason: STRING>>
RETURN FROM_JSON(ai_query(
'ya-MIXTRAL8x7b-provthro', "A customer left a review on a product. We want to follow up with anyone who appears unhappy.
Extract all entities mentioned. For each entity:
- classify sentiment as [POSITIVE,NEUTRAL,NEGATIVE]
- whether customer requires a follow-up: Y or N
- reason for requiring followup

Return JSON ONLY. No other text outside the JSON. JSON format:

product_name: <product name>,
category: <product category>,
sentiment: <review sentiment, one of [POSITIVE,NEUTRAL,NEGATIVE]>,
followup: <Y or N for follow up>,
followup_reason: <reason for followup>

### Instruction:"||review||"
### Response:"),"ARRAY<STRUCT<product_name: STRING, sentiment: STRING, followup: STRING, followup_reason: STRING>>")

Registering our prompt engineering into Unity Catalog SQL functions allows for much simpler distribution of modelling capabilities to analysts as well as allowing enterprises to properly govern access to these endpoints, which is critical in the AI space. This allows DBSQL admins to govern access to models in the same way they govern tables, source data, and everything else in Databricks Unity Catalog.

Step 2.d. Create the silver table :

In this step we will save our dataset as a silver table. We will pass the data from the “review” column from the customer review raw table to the wrapper function created in Step 2.c. The silver table is a denormalized table created by joining the raw zone customer table and the customer review table .

Please see the below code to create the silver table:

Create or REPLACE TABLE hls_yatish.hls_interop_yatish.silver_Review_exploded 
AS
SELECT * ,review_annotated.product_name[0] as product_name , review_annotated.sentiment[0] as sentiment,
review_annotated.followup[0] as followup,review_annotated.followup_reason[0] as followup_reason FROM (
SELECT *, hls_yatish.open_ai_sql.Test_Silver_mistral(review) AS review_annotated
FROM hls_yatish.open_ai_sql.fake_reviews LIMIT 10)
INNER JOIN hls_yatish.open_ai_sql.fake_customers using (customer_id);

Below is a sample output of the silver table showing customer details and the review given by the customer, along with fields extracted from inference of Mixtral-8x7B endpoint using AI_query. These include product name, customer review sentiment, followup, and reason for followup .

AI function output nativelty modelled into DBSQL Delta Tables

Step 3 — Gold Table: Create a draft response for the customer success team where a follow up is required

Step 3.a. Create wrapper function using AI_Query:

Following similar steps as in Step 2, we will create a wrapper function which takes the input, Customer First Name, Last Name, Product Name, and Follow Up Reason as to why there was a Negative review by the customer, and this is sent to our LLM endpoint (‘ya-MIXTRAL8x7b-provthro’) for inference using AI_Query function.

Create OR REPLACE function hls_yatish.open_ai_sql.Gold_mistral_GenerateResponse(firstname String, lastname String,product_name String , followup_reason String)
RETURNS STRING
RETURN ai_query(
'ya-MIXTRAL8x7b-provthro', "A customer left a negative review of the product . Look at the customer firstname,lastname,productname and folloup reason
and prepare a empathatic response draft for the customer which can be sent to the customer

### Instruction:"||concat(firstname,lastname,product_name,followup_reason)||"
### Response:","STRING")

Below is the sample output of the above wrapper function:

Output automated customer service response

Step 3.b. Create Gold table:

In the Gold table we will pass the customer name along with product details and the review feedback reason to the wrapper function created in the above step. Below is the sample code:

Create or REPLACE TABLE  hls_yatish.hls_interop_yatish.Gold_draft_response 
AS
SELECT *,
hls_yatish.open_ai_sql.Gold_mistral_GenerateResponse(firstname,lastname,product_name,followup_reason) AS Generate_Response
FROM hls_yatish.hls_interop_yatish.silver_Review_exploded
WHERE followup="Y" LIMIT 10;

Below is the sample output of our Gold table:

End to End Customer Triage and follow up with AI functions on DBSQL!

Conclusion

In this end to end example, we have shown how users can directly embed AI-centric workflows for real uses cases and democratize them to SQL Analysts. Not only that, these admins and users govern, monitor, and manage the models, prompt engineering functions, and output results in a single pane of glass in Unity Catalog on Databricks! In just a few short lines of code, we have created an end to end automated customer service triage and response system all in SQL!

Ultimately, Ai_query provides a powerful and convenient entry point to prompt engineering LLMs with your domain-specific data through Databricks SQL. AI_Query can be used by Analysts to perform various automations to create an E2E ETL process. LLMs have the ability to apply both automation and creativity across a wide range of language-related tasks, and with the Ai_query functionality it opens endless possibilities.

--

--

Databricks SQL SME
DBSQL SME Engineering

One stop shop for all technical how-tos, demos, and best practices for building on Databricks SQL