How to use the OpenAI API with Snowflake to analyze sentiment in Reddit comments

Hi folks — Gilberto here! I’m part of the Developer Relations team at Snowflake. As a Developer Advocate, something that is constantly on my mind is how we can improve the Snowflake product for end users. We’re always launching new features at Snowflake, and I’m always curious to learn how our end users feel or think about what we launch. How do I go about doing this?

There are all sorts of places you could go to to understand your end users’ sentiments. You could conduct some user interviews, dig through past survey responses, connect with product engineering teams to learn more about feedback they’ve collected, and so forth and so on. Then comes the task of synthesizing that information and drawing insights from it — which can be tedious depending on your method(s).

Personally, I really enjoy interacting with our end users in our online developer communities, and I lean a bit more toward understanding our users’ opinions, pain points, and general sentiments on — drum roll — social media (Reddit in particular)! And thanks to generative AI, I can do this really quickly on large datasets using Snowflake.

In this blog post, I’ll show you how to analyze sentiment (on just about anything!) using Reddit comments, OpenAI, and, of course, Snowflake!

Let’s get to it!

Prerequisites

If you don’t plan on following along step-by-step, you can skip this section. Otherwise, there are a few prerequisites you’ll need:

Step 1 — Ingest data to analyze

The first thing you’ll need to do is gather a lot of public user data to analyze! By querying the Reddit API, you can pull all sorts of comments that users have posted on a particular subreddit. I won’t demonstrate how to do this step-by-step in this tutorial, but if you’re interested in learning how to ingest that data into Snowflake, this phenomenal blog post by

will show you exactly how to do that.

Step 2 — Configure Snowflake to call OpenAI’s API using AWS API Gateway

Now that you have data to analyze in Snowflake, you’ll need to configure Snowflake to call OpenAPI so that you can derive sentiment from a given string (i.e., a Reddit comment). There are a couple of ways to do this, but I’ve found that the easiest way to do this is to use AWS’s API Gateway as a proxy for calling the OpenAI API directly from Snowflake. Here’s a summary of the things you’ll need to do to configure Snowflake accordingly:

  1. Create a CloudFormation Stack within AWS
  2. Upload a CloudFormation Template (“CFT”) to the stack
  3. Configure the trust policy between Snowflake and AWS
  4. Create an API integration object within Snowflake, describing the integration with AWS
  5. Create translators within Snowflake to parse responses from the OpenAI API
  6. Create an external function using the API integration and translations

That may sound like a lot, but this incredibly detailed and helpful post from

walks you through how to do this in AWS and Snowflake. It took me about 15 minutes (at most) and I had never used CloudFormation within AWS before.

The Snowflake documentation on this topic is also incredibly helpful. In particular, be sure to check out the following pages:

Step 3 — Derive sentiment by calling the external function on the data

With the data in Snowflake, and with Snowflake now properly configured to call the OpenAI API, we can now do our sentiment analysis on the Reddit comments. To do this, I opened a worksheet within Snowflake, set my context to the correct database, schema, and table, and ran the following query:

SELECT body AS comment, OPENAI_EXT_FUNC('Classify this sentiment: ' || body)::VARIANT:choices[0]:text AS sentiment from REDDIT_COMMENTS where subreddit='target-subreddit';

This query calls the external function and runs it on a Reddit comment in the REDDIT_COMMENTS table in my database. You can replace target-subreddit with a subreddit of your choice, assuming it exists in the data that you ingested from Step 1.

The results? See for yourself below! If you wanted to, you could run this on tons more rows to get even more sentiment data — this is just a demonstration of what’s possible!

Sentiment analysis of a Reddit comment 😎

Conclusion

And there you have it! In just a few steps, you saw how Snowflake can call the OpenAI API to analyze sentiment from large sets of data. Try it out! Perhaps you have some large body of user feedback laying around and you’d like to analyze it to understand trends in your product. Visualizing and sharing the results could be fun (and perhaps influential!). I’ll leave that as an exercise for you to experiment with!

Signing off — I’m Gilberto, Developer Advocate at Snowflake — thank you for reading and be sure to check out our other developer resources:

--

--