Using Vanna.AI’s demo sales data effectively

Ashish Singal
Vanna AI
Published in
2 min readMay 18, 2023

On Vanna.AI, we just introduced (as of May 17, 2023) a new demo dataset — the TPC-H dataset that resembles a business, that comes default with each installation of Snowflake.

What’s in the sales data (tables & columns)

Let’s take a quick peak at what’s inside this dataset.

Here’s the structure, nicely provided by Snowflake. Customers create orders, which have line items, which are parts that are supplied by suppliers.

Asking some questions

Ok, let’s ask a few questions. First, let’s try a basic question to find the countries with the most customers.

what top 10 countries have the most customers? show the name of the country and number of customers.

Here’s the SQL generated, joining 3 tables together —

SELECT n.n_name AS country_name, COUNT(c.c_custkey) AS number_of_customers
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION n
JOIN SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER c ON n.n_nationkey = c.c_nationkey
GROUP BY country_name
ORDER BY number_of_customers DESC
LIMIT 10

And here’s the table —

Next, let’s try a much more complex question —

What are the top 10 customers that have the highest value of returned brass parts where each part is more than $10 in Germany?

This SQL is extremely complex, touching five tables and using several nested queries.

SELECT 
c.c_name AS customer_name,
SUM(l.l_extendedprice * l.l_discount) AS returned_parts_gross_value
FROM
SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.LINEITEM l
JOIN SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.ORDERS o
ON l.l_orderkey = o.o_orderkey
JOIN SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER c
ON o.o_custkey = c.c_custkey
JOIN SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.SUPPLIER s
ON l.l_suppkey = s.s_suppkey
JOIN SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION n
ON s.s_nationkey = n.n_nationkey
WHERE
l.l_returnflag = 'R'
AND s.s_nationkey = (
SELECT
n.n_nationkey
FROM
SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION n
WHERE
n.n_name = 'GERMANY'
)
AND l.l_extendedprice * (1 - l.l_discount) > 10
AND l.l_partkey IN (
SELECT
p.p_partkey
FROM
SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.PART p
WHERE
p.p_type LIKE '%BRASS%'
)
GROUP BY
customer_name
ORDER BY
returned_parts_gross_value DESC
LIMIT
10

But the AI got it right and here are the results —

--

--