Simple — Really Simple — Sentiment Analysis in Snowflake (Part 1)

Peter Beck
5 min readMay 6, 2024

--

“Sentiment Analysis” generally refers to the machine-learning assisted process of determining the emotional tone of a given body of text. Think of comments left by users on a product — they can range from ecstatic (“This product is so great it changed my life for the better”) to the harsh (“This product is so useless it ruined my life forever”) and with an infinite number of shades in between. Sentiment analysis using LLMs promises to enable the near instant “scoring” of very large numbers of bodies of text. Think of the hundreds or thousands of reviews a given type of product might get — trying to parse them and rate them manually could be an onerous task, and yet at the same time those reviews provide valuable customer feedback. Highly automated sentiment analysis can also be applied in real time to chat interactions and customer support calls.

I recently spent some time using Snowflakes SQL sentiment analysis function, SNOWFLAKE.CORTEX.SENTIMENT(), currently in public preview, and came away very impressed. SENTIMENT() takes a body of text as its input parameter, and returns a value between -1 and 1, where -1 is the most negative possible, and 1 is the most positive possible. That’s it!

This function is incredibly easy to use. In general, this is where Snowflake really excels — ease of use, and the associated cost savings that go with it. Instead of setting up a complicated ML pipeline, I can just load my data into Snowflake and run a simple query over it to provide the sentiment analysis I am looking for. Basic SQL skills are all that is required. With a little more effort I can easily engineer a scenario where data is being flowed via a pipeline from AWS to Snowflake and being scored in near real time.

For the following exercise I used ChatGPT to generate a reviews of fictional sporting goods products. I then ran a simple query to rate the sentiment of the reviews, and then finished with a simple Streamlit app to present the results of the analysis. Being able to load the data, query it, and present the results in single, simple interface really speeds up the delivery of this kind of analysis. No python scripting or linking to an external process is required for the data analysis — the solution is 100% SQL on the data side. The python-based Streamlit app we will build in part 2 to display the results is similarly simple.

Prerequisites:

1) Snowflake account (you can get a 30-day free trial account here, without a credit card)

2) Some basic familiarity with Snowflake, SQL and Python

3) Some exposure to Streamlit is a plus.

Note: the example below uses the SENTIMENT function which is part of Snowflake Cortex. The user that executes the function must have the SNOWFLAKE.CORTEX_USER database role.

First, get the data set I prepared from github. It’s relatively small, only 145 reviews, and it can be loaded easily using the Snowsight interface, so there is no need load the data from SnowSQL.

The file Sporting_Goods_Reviews.csv can be downloaded from github here: https://github.com/peterjohnbeck/snowflake_sentiment_data

To load and score the data, perform the following steps:

1) Log into your Snowflake account and create a new database called SENTIMENT_ANALYSIS.

2) In the Public schema, click on the Create button in the upper right and load the data using the Create Table… From File option.

3) Name the new table SPORTING_GOODS_REVIEWS:

On the next page click “Options” and select “First Line Contains Headers”:

… Then click load. You should get 145 rows loaded into a new table called SPORTING_GOODS_REVIEWS.

4) Now that we have the table loaded, we are going to need a column to record the relative sentiment — positive or negative — of each review. We will also add a column to “bin” the sentiment values from 1 to 10.

ALTER TABLE SENTIMENT_ANALYSIS.PUBLIC.SPORTING_GOODS_REVIEWS
ADD SENTIMENT FLOAT,
SENTIMENT_CAT NUMBER(2)

5) Now the simplicity of Snowflake becomes clear: simply run an update statement to score each review:

UPDATE SENTIMENT_ANALYSIS.PUBLIC.SPORTING_GOODS_REVIEWS
SET SENTIMENT = SNOWFLAKE.CORTEX.SENTIMENT(REVIEW)

That’s it! To derive the sentiment values for our reviews is as simple as running an update statement, and on a X-Small warehouse took 644 ms.

6) Lets put each sentiment value into a bucket from 1 to 10, where 1 is the worst and 10 is the best, using the powerful WIDTH_BUCKET function:

UPDATE PRODUCT_SENTIMENT_ANALYSIS
SET SENTIMENT_CAT = WIDTH_BUCKET(SENTIMENT,-1,1,10)

In this case, WIDTH_BUCKET will group the SENTIMENT values into 10 buckets, where -1 is the lowest value and 1 is the highest value in our SENTIMENT column.

Run some SELECTs over the data — you can see that reviews with negative language have sentiment values well under 0, while those with positive language have sentiment values well over 0:

Negative reviews get values from -1 t 0
Positive reviews get values from 0 to 1

In part 2 of this series we will create a small Streamlit app to display the results of the sentiment scoring.

--

--

Peter Beck

Peter Beck is an analytics and data warehousing specialist based in Ottawa, Canada.