SQL For Business People With ChatGPT Assistance

Speak the language of data in a smarter way

Mike Elahi
6 min readJun 11, 2023
Photo by KOBU Agency on Unsplash

With the rise of “big data” and most companies having access to increasing amounts of data about their customers, most wise business decisions are usually driven not by trial and error but by what the data shows to be happening in the real world.

Whether you want to know how many people are using that cat food calculator on your pet empire website, you want to know how many of your customers opted to receive your newsletter, or you’d like to know about the time of day you get the most orders, this data is usually stored on your business’s database.

The Problem With Existing Dashboards

Photo by Luke Chesser on Unsplash

When the questions are simple, finding the answers can be simple as well. You can check the dashboard made for this purpose or ask the person with access to that data to do a “query” for you and you’ll get the answer. But when the requirements get more complicated, so does the hassle of extracting the data. It may be that:

  • The dashboards at your disposal are inflexible or are internal, therefore requiring development time to add new insights and data.
  • The element of asking for data may be inefficient, it can take up time for multiple people in the process and it discourages exploration since every tweak needs a back-and-forth.

It is for this purpose that businesses usually prefer those who need to ask questions about data, remove the middleman and learn how to speak directly to the database. This is where SQL comes in.

What Is A Database? And What Is SQL?

There’s a chance that you’re already familiar with what SQL is, in that case, feel free to skip this section!

Most of the data regarding a business is usually stored in a kind of software called a database. This database software (professionally called Database Management System) is made specifically to store the data and later get that data in an efficient way (or allow you to perform actions on that data such as deleting and updating).

Throughout most of the past decades, the most popular system has been RDMBS, R standing for Relational. This basically means that data is stored in “tables” (similar to sheets in spreadsheet software) that have columns and rows and that the data in one table can point to another table.

Is your business using an RDBMS database? if you hear the mention of software like PostgreSQL, MySQL, MariaDB or SQL Server, the answer is yes!

Spreadsheets Analogy

Let’s make this clear with an example using spreadsheets software, in a spreadsheet, you may have an Orders sheet that has the columns “Order Number”, “Customer Number”, “Total”, “Time and Date”.

You may also have a Customers sheet, in which you store “Customer Number”, “First Name”, “Last Name”, “Phone Number”, and more.

Now imagine that you’re trying to get all orders alongside the phone number of each customer. In the sense of a database, this is a “relation” because every row in your orders sheet is related in some way to the customers sheet.

While the spreadsheet analogy is an oversimplification of what databases do, it is a great way to begin understanding them.

Talk to Me In SQL

Now that we have established what databases are, it is time to talk about what you may have wanted to learn: SQL. SQL (or structured query language) is a “language” for talking to the database, it is not as overwhelming as a programming language either:

  • if you’re trying to ask for information, you will start your query with SELECT. (as opposed to DELETE, UPDATE, etc.)
  • you will then tell the database what you want with a <table>.<column> like “orders.order_number”
  • you will tell it about the main table you want the database to look in: “FROM orders”
  • finally, you will narrow it down by telling it your criteria using WHERE: “WHERE order.customer_number = 2”

therefore, to get all order numbers for customer number 2, we can do:

SELECT orders.order_number FROM orders WHERE orders.customer_number = 2;

this will give us a result which is one column containing all order numbers for that customer.

The Challenge of Mind To SQL

By knowing the basics of SQL you should be able to ask simple questions directly, however, once the questions get more complicated, it can prove difficult to intuitively find what you need to do. This is true for spreadsheets as well, you’ll have to know the intricate details and functions that do what you want. This is where you’ll probably learn and get results faster with the help of ChatGPT.

Prompting ChatGPT The Right Way

Now you may ask how to properly ask ChatGPT for what I want? Here’s what I would do:

You are a helpful SQL assistant, I will give you the structure of my SQL database, then I will ask questions about the data which you should answer using a SQL query. After the SQL query, you may explain the bits of the query and how they function.

table: orders, columns: order_number, customer_number, total (numeric), time_and_date

table: customers, columns: customer_number, first_name, last_name, phone_number

I would like to know the last name and phone number of every order that was placed after January 1st, 2023.

Basically, by telling ChatGPT what I want and what (relevant) tables I have, I can simply ask it about the data, and it will come back with the results:

SELECT c.last_name, c.phone_number
FROM orders o
JOIN customers c ON o.customer_number = c.customer_number
WHERE o.time_and_date > '2023-01-01';

and the following is the explanation by ChatGPT:

Let’s break down the query step by step:

  1. The SELECT statement is used to specify the columns we want to retrieve from the database. In this case, we want to select the last name (c.last_name) and phone number (c.phone_number) from the customers table.
  2. The FROM clause specifies the tables involved in the query. We need to retrieve data from both the orders and customers tables. We assign aliases to these tables using the o and c aliases, respectively.
  3. The JOIN keyword is used to combine rows from two or more tables based on a related column between them. In this case, we want to join the orders table (o) with the customers table (c) using the customer_number column.
  4. The ON keyword is used to specify the join condition, which is the equality between the customer_number column in the orders table and the customer_number column in the customers table.
  5. The WHERE clause is used to filter the results based on a specified condition. In this case, we want to retrieve only the orders that were placed after January 1st, 2023. The condition o.time_and_date > '2023-01-01' ensures that only the relevant orders are included in the result set.

While you can probably ask for further clarification, ChatGPT has already provided a working query that can be used to get the data you want.

In the further steps that you’ll be taking, you can also ask ChatGPT for modifications, ask about the details of the query, or ask about performance improvements.

  • Can you give me the first name of the customer as well?
  • What does join mean?
  • This query is taking over 20 minutes to run, can you provide samples of queries that may provide the same results but are faster?

Final Thoughts

While ChatGPT must be used cautiously as it can oftentimes look authoritative yet give incorrect results, it can greatly help reduce the barrier to entry for learning how to interact more directly with the data, all you will need to do then instead of writing the queries is engineering the prompt!

if you liked this work, please make sure to follow me for more on technology and software. and if you have similar experiences, prompts or questions, feel free to ask them in the comments!

--

--