Supercharge Your Starbucks Analysis: ChatGPT + SQL for Powerful Results

Amal
ScrapeHero
Published in
4 min readFeb 28, 2024

Struggling to write complex SQL queries or interpret their results? Look no further! This exciting collaboration between ChatGPT, the powerful language model, and SQL, the essential data querying language, empowers you to:

  • Craft SQL queries effortlessly: ChatGPT understands your natural language requests and generates accurate SQL code, saving you time and effort.
  • Gain deeper insights: Analyze your data with ease, as ChatGPT helps interpret complex results and translate them into actionable insights.

This revolutionary duo simplifies data analysis, making it accessible to all, from beginners to advanced professionals. Let’s start diving into the blog.

Installation

Create a Virtual Environment on Python or Anaconda:

python -m venv myenv

Activate the Virtual Environment:

On Windows:

myenv\Scripts\activate

On macOS and Linux:

source myenv/bin/activate

Install dependency libraries:

pip install -q langchain openai pandas

Data:

Data is about location reviews and ratings of Starbucks stores in the USA region.

The data has been collected from ScrapeHero, one of the leading web-scraping companies in the world. Here’s the Data Source that we used for analysis!

Columns:

ID, Name, Address, Street, Zip_Code, State, City, Author, Review, Rating

Sample Data:

Next step would be storing data in SQL Database for analysis.

import sqlite3
import pandas as pd

starbucks_df = pd.read_csv("Starbucks_reviews.csv")
connection = sqlite3.connect("starbucks.db")

# convert df to sql table named starbucks_review
starbucks_df.to_sql("starbucks_review", connection, if_exists="replace")

Data Analysis on Starbucks Reviews:

Now let’s start by loading the data from SQL using Langchain SQL toolkit and later use SQL agents from Langchain to fully automate the workflow.

The SQL Agent in LangChain is like a special helper that talks to databases. It’s better than writing direct commands because it can:

  • Do more than just grab information: It can answer your questions about the database, fix mistakes, and ask the database multiple times to fully answer your question.
  • Be more efficient: It only gets the data you need, making things faster.
  • Handle errors: If something goes wrong, it can try again and tell you what happened.

Overall, the SQL Agent makes it easier and more powerful to use databases in LangChain, allowing for better and more user-friendly applications.

from langchain.agents import create_sql_agent
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain.agents.agent_types import AgentType
from langchain.sql_database import SQLDatabase
from langchain.chat_models import ChatOpenAI


my_db = SQLDatabase.from_uri("sqlite:///starbucks.db")
llm = ChatOpenAI(openai_api_key = "YOUR_API_KEY", model_name = "gpt-3.5-turbo-1106")


# SQL Agent
lc_agent = create_sql_agent(
llm = llm,
toolkit= SQLDatabaseToolkit(db= my_db, llm=llm),
agent_type= AgentType.ZERO_SHOT_REACT_DESCRIPTION,
handle_parsing_error = True,
verbose= True,
return_intermediate_steps = True
)

Let’s start by asking questions to the database:

query = “what is overall rating of starbucks across all locations?”

lc_agent.run(query)

This is what is happening under the hood. It first creates an action and converts the query to SQL query, if it identifies an error, it later tries to fix the error.

> Entering new SQL Agent Executor chain…
I need to retrieve the overall rating of Starbucks across all locations from the database.
Action: sql_db_query
Action Input: SELECT AVG(rating) AS overall_rating FROM starbucks_locationsError: (sqlite3.OperationalError) no such table: starbucks_locations
[SQL: SELECT AVG(rating) AS overall_rating FROM starbucks_locations]
(Background on this error at: https://sqlalche.me/e/20/e3q8)I need to check if the table “starbucks_locations” exists in the database.
Action: sql_db_list_tables
Action Input: starbucks_reviewI made a mistake in the table name. I need to use “starbucks_review” instead of “starbucks_locations”.
Action: sql_db_query
Action Input: SELECT AVG(rating) AS overall_rating FROM starbucks_review[(4.032099819603127,)]I now know the final answer
Final Answer: The overall rating of Starbucks across all locations is 3.57.

> Finished chain.
‘The overall rating of Starbucks across all locations is 4.03.’

Answer:

‘The overall rating of Starbucks across all locations is 4.03.’

Question: “List all the column names with data type in this table”

Answer:

‘The column names with data types in the starbucks_review table are: 
\n- index: INTEGER\n- id: TEXT\n- author: TEXT\n- address: TEXT\n-
name: TEXT\n- rating: REAL\n- review: TEXT\n- zip_code: INTEGER\n-
state: TEXT\n- city: TEXT\n- street: TEXT’

Question: “Which state has the highest average rating for Starbucks?”

Answer:

‘North Dakota has the highest average rating for Starbucks’

Question: “Which state has the highest number of reviews?”

Answer:

‘California (CA)

Question: “How many reviews are positive (rating > 3.5), neutral (rating between 3 and 3.5), and negative (rating < 3)?”

Answer:

There are 6188 negative reviews, 1481 neutral reviews, and 12287 positive reviews.

Conclusion:

In this blog, we’ve explored the potential of combining the power of ChatGPT’s natural language processing with the structured data analysis capabilities of SQL to gain deeper insights into Starbucks’ success.

This is only the beginning, as these technologies evolve, we can expect more powerful ways to extract meaningful insights from the data leading to effective decision-making and greater success.

Hope you learned something new today, Happy Learning!

--

--