AI Dev Tips #7: Advanced Database Queries with AI/Claude/ChatGPT

AI SQL insanity: Window functions, Common Table Expressions (CTEs), JSON functions, and subqueries.

Published in

AI Dev Tips

11 min readSep 4, 2024

In this step-by-step tutorial #7 in the series, we will tackle more complex scenarios and SQL queries, such as using window functions, Common Table Expressions (CTEs), JSON functions, and subqueries.

By the end, you’ll see how ChatGPT or Claude (Anthropic) can significantly simplify these tasks, making your development process faster and more efficient.

After this AI Dev Tips #7, I plan to explore some other non-SQL topics, this will be a pause (not a complete end) of this particular SQL topic exploration.

AI Dev Tips #8 will be a different AI Dev topic. But then we may return to this again (I have some ideas already): so it’s a VERY GOOD IDEA to keep your notes, like the code you generate for your schema and fake data — it will make it easier for you later!

Coming up in this article:

Quick recap of links in this series — setup for turbocharging your databases with AI like Claude 3.5 and ChatGPT.
Advanced SQL queries, including window functions and CTEs.
Using ChatGPT to generate JSON functions and subqueries.
Review refining and validating complex SQL queries.

Notes:

Examples shown are with ChatGPT but Claude 3.5 can also handle these.
You may get slight variations on answers,
Also it will obviously reflect changes in your schema. I am using my book store example. You may want to use a different one, so the input/output would be different.

Recap: Getting Started

In our previous tutorials, we set up a basic database schema using ChatGPT and populated it with fake data and then learned a lot of interesting things about generating queries — from very basic to more complex to creating whole LISTS of queries (this is awesome for brainstorming or learning more about what you can do with your data)

This may look familiar to “regulars” here so you can skip this “recap” if you been doing these already, but we have a lot of new people too!

Also you should go find your old chats because you will need the schema and fake data.

QUICK TIP: I have one whole schema that you can use in AI Dev Tips 1: Business Idea to PostgreSQL Database Schema in 5 Minutes (scroll down and find it) then just skip to to the fake data making: AI Dev Tips #4 — ChatGPT Creates Fake Data for our DB Schema

If you are new to AI Dev Tips and want to follow along — it’s a good idea to walkthrough the items below, it takes only a few minutes, everything will make more sense:

Schema Creation: ChatGPT created a PostgreSQL database schema. I used a book store (and for query examples here), but you could use something different. See: AI Dev Tips 1: Business Idea to PostgreSQL Database Schema in 5 Minutes
Schema Validation: We copied these statements into pg-sql.com, a tool that helps us validate SQL queries and schemas online.
Mock Data for Populating the Tables: We used AI-generated fake data to populate our tables. This helped us create realistic test scenarios to validate our SQL queries. AI Dev Tips #4 — ChatGPT Creates Fake Data for our DB Schema
Queries: We tried some techniques for basic queries including both with and without gherkin syntax blocks. See: AI Dev Tips #5: AI/ChatGPT Postgres Query Generation from Schema and more complex quries AI Dev Tips #6: AI Generates Complex DB Queries and Lists

If you missed these, check them out to get up to speed! (tutorial resumes after unobtrusive quick promo)

Quick promo: Big Data & 2-Packs Available now:

By popular demand, I combined some existing books into a discounted 2-pack, see all the offerings here: SystemsArchitect Store — example below too. You get immediate PDF digital access (1,600 pages of Cloud Engineering checklists). Metrics + AWS Essentials is also available.

Avanced SQL Queries with ChatGPT

Understanding advanced SQL concepts
Creating advanced SQL Queries with ChatGPT
Working with JSON functions
Generating and testing subqueries

1. Understanding Advanced SQL Concepts

Before we dive into generating queries with ChatGPT, let’s cover a few advanced SQL concepts:

Window Functions: These are SQL functions that perform calculations across a set of table rows related to the current row. They are often used for analytics. You will see examples below.

SELECT
  employee_id,
  department,
  salary,
  RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM
  employees;

Running totals: Calculate cumulative sums over a partition of data, such as running totals of sales over months.
Ranking: Rank rows within a result set based on specific criteria, like finding the top N salespersons in each region.
Moving averages: Compute averages over a sliding window of rows, useful for analyzing trends over time.
Percentiles: Calculate percentiles to understand the distribution of data, such as determining the median salary in a department.
Lag and lead functions: Compare current row data with previous or next rows without using self-joins, useful in financial calculations like finding changes in stock prices.

Common Table Expressions (CTEs): A CTE is a temporary result set that can be referenced within a SELECT, INSERT, UPDATE, or DELETE statement. They can also be recursive.

WITH SalesCTE AS (
  SELECT
    salesperson_id,
    SUM(sales_amount) AS total_sales
  FROM
    sales
  GROUP BY
    salesperson_id
)
SELECT
  salesperson_id,
  total_sales
FROM
  SalesCTE
WHERE
  total_sales > 50000;

Simplify complex queries: Break down complex queries into simpler subqueries that are easier to understand and maintain.
Recursive queries: Handle hierarchical or recursive data structures, like organizational charts or file system directories, by referencing the CTE within itself.
Temporary data storage: Store intermediate results temporarily without creating a new table, useful in reporting where multiple transformations are needed.
Multiple references: Reuse the same result set within a single query, avoiding the need to rewrite subqueries.
Modular SQL code: Create modular, reusable SQL code for different parts of a query to enhance maintainability.

JSON Functions: Used to manipulate JSON data within SQL.

SELECT
  customer_id,
  json_data->>'name' AS customer_name
FROM
  customers
WHERE
  json_data->>'status' = 'active';

Querying JSON data: Extract specific data from JSON fields, such as finding customers based on JSON attributes like addresses or preferences.
Updating JSON fields: Modify JSON objects directly within the database without needing to rewrite the entire object.
Aggregating JSON data: Summarize or aggregate JSON data to generate reports or analyze trends.
Transforming JSON data: Convert JSON data into relational formats for integration with other SQL tables or systems.
Storing dynamic data: Store variable or evolving schema data in JSON format, useful for applications where data structures change frequently.

Subqueries: Nested queries within a larger SQL query. These can be used to perform more complex operations.

SELECT
  employee_id,
  salary
FROM
  employees
WHERE
  salary > (SELECT AVG(salary) FROM employees);

Filtering results: Use subqueries to filter results based on criteria that depend on other queries, like finding all employees who earn more than the average salary.
Calculating aggregates: Perform aggregate calculations that influence other query results, such as getting products with sales above a certain threshold.
Correlated subqueries: Execute a subquery for each row processed by the outer query, useful for row-wise comparisons like finding customers with orders exceeding their average order amount.
Data transformation: Transform data within subqueries before using it in the main query, such as formatting or concatenating strings.
Hierarchical queries: Handle parent-child relationships within a single query, useful in querying organizational hierarchies or product categories.

2. Creating Advanced SQL Queries with ChatGPT

Now, let’s create some advanced SQL queries using ChatGPT based on different scenarios.

Scenario: Rank users based on their total expenditure.

FIRST: Make sure you have already entered your schema into the AI prompt, as we did in the prior article. Populate it with the fake data. Otherwise ChatGPT or your AI will have no idea what you’re talking about.

After making sure ChatGPT is aware of your schema… we’ll ask for a Window function query:

ChatGPT 4o prompt:
I need a query that does the following: When I request a total expenditure for each user, I should receive a ranked list where each entry contains a user ID and their rank based on total expenditure from highest to lowest. Use a window function.

Output:

SELECT 
    user_id,
    total_expenditure,
    RANK() OVER (ORDER BY total_expenditure DESC) AS rank
FROM (
    SELECT 
        user_id, 
        SUM(total_amount) AS total_expenditure
    FROM 
        Orders
    GROUP BY 
        user_id
) AS subquery;

Subquery: The inner query calculates the total_expenditure for each user by summing up the total_amount from the Orders table and grouping the results by user_id.
RANK() OVER: The outer query uses the RANK() window function to assign a rank to each user based on their total expenditure in descending order.
RANK(): This is a window function that assigns a rank to each row within a partition of a result set.
OVER (ORDER BY total_expenditure DESC): The OVER clause defines the window for the RANK() function. In this case, it specifies that rows should be ordered by total_expenditure in descending order.
ORDER BY total_expenditure DESC: This part of the window function indicates that the ranking should be based on the total_expenditure column, with the highest values receiving the lowest rank numbers (starting at 1).
AS rank: This aliases the result of the RANK() function as rank in the final output.

Testing: Paste the query into pg-sql.com and run it to check the results.

Input and Output (see Query Result) from pg-sql:

Scenario: Retrieve books priced above a specific amount using a CTE.

ChatGPT 4o prompt:
I need a query that does the following: Retrieve a list of books where the price is greater than $80 — please do this using a CTE.

Input and Output of ChatGPT:

CTE (Common Table Expression): The WITH clause defines a temporary result set named ExpensiveBooks that contains all books with a price greater than $80.
Main Query: The SELECT * FROM ExpensiveBooks; retrieves all columns from the CTE, effectively listing all books that meet the price criteria.

Testing: Adjust the price as needed and run the query in your SQL tool.

My highest price was only $10.99 so I adjusted it down from 80 to 10

See “Query Result”:

Scenario: Count the number of books added between specific dates.

ChatGPT 4o prompt:
Count the number of books added between January 1, 2022, and January 10, 2022.

Important: Make sure to change the date for your data.

I had to change mine to 1950 as my dates were all old.

Step 3: Working with JSON Functions

JSON functions allow you to return SQL results in JSON format. Here’s how to create a JSON object of user data:

Scenario: Return user data as a JSON object.

ChatGPT 4o prompt:
Output user data as a JSON object.

Output:

Testing: Run the query in pg-sql.com to see the JSON output for each user.

Step 4: Generating and Testing Subqueries

Subqueries are useful for complex data retrieval scenarios:

Scenario: Find the top three authors with the most expensive average book prices.

ChatGPT 4o prompt:
List the top three authors with the most expensive average book prices and their most expensive book.

Output code from ChatGPT

WITH AuthorAveragePrices AS (
    SELECT 
        author,
        AVG(price) AS avg_price
    FROM 
        Books
    GROUP BY 
        author
),
TopThreeAuthors AS (
    SELECT 
        author,
        avg_price,
        RANK() OVER (ORDER BY avg_price DESC) AS rank
    FROM 
        AuthorAveragePrices
    WHERE avg_price IS NOT NULL
    ORDER BY avg_price DESC
)
SELECT 
    b.author,
    tta.avg_price,
    b.title AS most_expensive_book,
    b.price AS most_expensive_book_price
FROM 
    Books b
JOIN 
    TopThreeAuthors tta ON b.author = tta.author
WHERE 
    tta.rank <= 3
ORDER BY 
    tta.avg_price DESC, b.price DESC;

Testing: Using pg-sql.com, execute the query and check if the output matches the expected list of authors and their books.

Step 5: Refining Your Queries

While ChatGPT can generate SQL queries quickly, always review and test the code:

Review the code: Ensure the query logic aligns with your requirements.
Test thoroughly: Run the queries in different environments and with various data sets to validate their accuracy.
Seek expert advice: If you’re not a SQL expert, get a second opinion from a database specialist.
Go nuts. Try different ideas!

Tips and Best Practices

Iterate and improve: Start with simple queries and gradually introduce complexity. This helps in understanding how ChatGPT generates SQL code.
Experiment with variations: Ask ChatGPT to generate different variations of the same query to compare performance and readability.
Collaborate with experts: Work with database experts to fine-tune your queries and ensure they meet enterprise-level requirements.
Always validate: Never use AI-generated code directly in production without thorough testing and review.

Important: This tutorial is for prototyping not production, unless you have simple production databases and fully test it. Validate and refine these queries to ensure they perform optimally and meet your specific needs.

I did not use Gherkins to shorten this, but also use Gherkins when possible: Why use “Gherkins” in Software Development?

Using ChatGPT to generate advanced SQL queries can significantly speed up the development process, allowing you to focus more on the creative and strategic aspects of your project.

We looked at various scenarios like ranking users based on expenditures, retrieving books priced above a certain value, and aggregating books by authors into JSON arrays.

Keep an eye on AI Dev Tips and follow/subscribe: https://medium.com/ai-dev-tips

Thanks for reading ! 🥰 If you like this topic, please give some claps (it’s free), bookmark, add to your list and share with colleagues. I follow back. I’m here to meet people, learn, and contribute. 🥰

I have a couple new Ebooks you may want to check out:

Ebook: Check out my Cloud Audit ebook / store: SystemsArchitect Store and I also have the new Big Data book, check for it in the store.

About me

I’m a cloud architect, senior developer and tech lead who enjoys solving high-value challenges with innovative solutions.

I’m always open to discussing projects. If you need help, have an opportunity or simply want to chat, you can reach me on X/Twitter @csjcode and same username gmail.

I’ve worked 20+ years in software development, both in an enterprise setting such as NIKE and the original MP3.com, as well as startups like FreshPatents, SystemsArchitect.io, API.cc, and Instantiate.io.

My experience ranges from cloud ecommerce, API design/implementation, serverless, AI integration for development, content management, frontend UI/UX architecture and login/authentication. I give tech talks, tutorials and share documentation of architecting software. Also previously held AWS Solutions Architect certification.

Cloud Ebook Store — check for cloud architect and engineering books at a great value, “Cloud Metrics” (800 pages+) and “Cloud Audit” (800 pages+) and more — https://store.systemsarchitect.io

And my website:

Recently I’m working on Instantiate.io, a value creation experiment tool to help startup planning with AI. I write a reference manual on cloud metrics.

Also, an enthusiast of blockchain, I’m active working on applications in the innovative Solana blockchain ecosystem.