Simplify Tableau data modeling with GPT-4 based SQL generation

Gunther Hagleitner
Waii
Published in
4 min readMay 9, 2023

In this blog we take a look at how to leverage new advances ushered in by LLMs and GPT-4 with Tableau. In particular we will show how you can simplify the modeling step and thus provide insights faster than ever before.

We will do this by using GPT-4 to generate SQL that can then be used as the base model to build visualizations on. Since GPT-4 by itself doesn’t know how to connect your warehouse information to the questions you’re asking, we use a tool we’ve built for this job. We call it, TweakIt.

We’ve looked in detail at the techniques required to get high-accuracy SQL-generation capabilities from LLMs. Those techniques are implemented by TweakIt, and are detailed here and here.

Let’s start with the problem statement. Let’s say you have sales and customer information about your US retail stores in a Snowflake database. A business analyst comes to you and says they would like to generate a heat map of which states are growing the fastest by sales — this heat map is needed tomorrow for an executive briefing.

After logging into TweakIt and connecting it to your Snowflake instance, you type the following question:

Note, there is no other modeling involved. No need to specify tables, joins, relationships, etc. Just point TweakIt at the database and ask the question.

It comes back with:

WITH sales_per_year AS
(SELECT s.S_STORE_NAME AS store_name,
s.S_STATE AS state,
d.D_YEAR AS YEAR,
SUM(ss.SS_NET_PAID) AS total_sales
FROM store_sales ss
JOIN store s ON ss.SS_STORE_SK = s.S_STORE_SK
JOIN date_dim d ON ss.SS_SOLD_DATE_SK = d.D_DATE_SK
GROUP BY s.S_STORE_NAME,
s.S_STATE,
d.D_YEAR),
sales_growth AS
(SELECT store_name,
state,
YEAR,
total_sales,
(total_sales - LAG(total_sales)
OVER (PARTITION BY store_name, state
ORDER BY YEAR))
/ NULLIF(LAG(total_sales)
OVER (PARTITION BY store_name, state
ORDER BY YEAR), 0)
AS sales_growth
FROM sales_per_year)
SELECT *
FROM sales_growth
ORDER BY store_name, state, YEAR;

Looks reasonable, but how do you trust it? Let’s take a look at the explanation that was produced with it:

That seems correct. Except it should not use NET PAID, but a different computation. You tweak the original question with:

That will change the query to use the requested definition for total sales. After that you execute the query and take a look at the result:

And that looks good as well. Here is what the whole page looks like:

It’s time to take the query to the BI tool. In Tableau you create a new data source (or edit an existing one if you’re already connected to the database you want.)

Then once you are connected you pick, “New Custom SQL”

And enter the query you have generated. Like so:

Once that is done you can use the generated model to build the visualization like you are used to. In this case we are filtering by year, using the state names as location information and encode the median of the year-over-year growth as the color of the heat-map. The result is this:

Voila. Ship it.

To recap: Without knowing the database in any detail, we were able to build a model with confidence and in just a few seconds. From there we were able to use Tableau as to and deliver the requested visuals and insights. Life will never be the same.

If you’re interested in seeing this in action for yourself, please join our Slack channel: Wai’i community, or request access at waii.ai

This post was written by Wangda Tan and Gunther Hagleitner.

--

--