Analyzing customer feedback at scale — from Google Sheets to BigQuery!

Alicia Williams
Google Cloud - Community
10 min readFeb 6, 2024

--

Unstructured customer feedback is a goldmine of insights, but analyzing open-ended comments manually is time-consuming and unscalable. A few years ago, I tackled this challenge by creating a solution in Google Sheets which was a great start, but it hit limits as data grew.

So in this solution update, I’m bringing in BigQuery! In this post, I’ll walk you through analyzing sentiment at scale without writing a single line of code.

High-level overview of analysis workflow

The solution still utilizes Cloud Natural Language and works much like the original: it analyzes the sentiment of feedback by extracting the “entities” (or things people are talking about), assigns a sentiment score (the likelihood the “entity” is being discussed in a positive or negative way), and aggregates the results in a visualization to make these insights more accessible.

Analyzing feedback at scale — with BigQuery

BigQuery is a cloud data warehouse (part of Google Cloud) that can store an almost unlimited amount of data and allows you to analyze that data using SQL with incredibly fast processing speeds due to its query engine.

So why’d I use BigQuery for this solution?

  • BigQuery is integrated: Cloud Natural Language (as well as other Vertex pre-trained models) is available directly within BigQuery for analyzing data without writing any code — just a SQL statement.
  • BigQuery is fast: Analyzing just over 400 vacation property reviews using BigQuery took 98 seconds versus 301 seconds using my original solution built with Apps Script.
  • BigQuery is scalable: Apps Script’s execution time limit is currently 360 seconds, meaning the 400 rows of test data put the original solution at its limit, while BigQuery can send over 100,000 rows to Cloud Natural Language for analysis (with standard quota).

Step-by-step sentiment analysis in BigQuery

I’ll outline how to perform this sentiment analysis workflow in BigQuery by walking through a few setup steps (which you only need to do once), and then taking you through performing the sentiment analysis itself.

Setup steps (one-time):

  1. Set up your Google Cloud environment: this includes creating a project and enabling APIs required for the workflow
  2. Load sample data into BigQuery: this step walks you through uploading local CSV data into a BigQuery table (using the same dataset from my original demo)
  3. Create a BigQuery Connection: this connection is required for BigQuery to access the translation and sentiment models
  4. Create Natural Language and Translation functions: this step creates functions in BigQuery that directly access the Natural Language (ML.UNDERSTAND_TEXT) and Translation (ML.TRANSLATE) models

Once you’ve completed the setup steps, you’ll have two functions in BigQuery that you can use on an ongoing basis to access any of the features available in the Translation API and the Natural Language API, including entity sentiment analysis, and use them to analyze your BigQuery data.

Sentiment analysis steps (ad-hoc):

  1. Translate feedback text to English: this step uses the Translation function in BigQuery (ML.TRANSLATE)
  2. Analyze sentiment of feedback: this step uses the Cloud Natural Language function in BigQuery (ML.UNDERSTAND_TEXT) to process translated feedback for sentiment analysis
  3. Format sentiment analysis response: this step uses SQL to transform the raw results sent from Cloud Natural Language in order to enable analysis
  4. Visualize results for insights: this step exports the transformed data to Google Sheets to summarize and visualize the results

Set up Google Cloud environment

BigQuery, Cloud Natural Language, and Cloud Translation are all part of Google Cloud and require a Google Cloud environment with billing enabled.

A note on costs: The Free Tier of Google Cloud covers many of the services used in this workflow, such as storage and querying data in BigQuery and analyzing text in Cloud Natural Language. I ran through this workflow one time using the sample dataset, and my cost was $3.29 (attributed completely to the Cloud Translation API, which does not have a free tier). The cost of Cloud Translation cost can be covered using the Free Trial, or by skipping the translation step in the workflow).

To set up your environment:

  1. Create a Google Cloud project and ensure billing is enabled. (These instructions are a nice, concise view of these processes, as well.)
  2. Click this link to enable the services in your project that are required for this solution: BigQuery API, BigQuery Connection API, Cloud Translation API and the Cloud Natural Language API.

Load sample data into BigQuery

This section will take you through loading sample vacation property review data into BigQuery. After you’ve tested this solution with the sample data, you can follow the documentation to load your own data into BigQuery, as well, to analyze it with this same workflow.

  1. Navigate to the BigQuery Console
  2. Create a dataset by clicking on the three dots next to your project ID in the Explorer panel, and selecting Create dataset.

3. Name the dataset feedback_analysis.

4. Download the sample vacation property reviews dataset from Kaggle. Expand the zip file and save the reviews.csv file to your desktop.

5. Create a new table within the dataset by clicking on the three dots next to the dataset name in the explorer panel, and selecting Create table.

6. Complete the Create table page as outlined in the screenshots below (substituting your own project ID in the relevant field) and then click the Create Table button:

7. Highlight your new table in the explorer panel, and you can see details on the table including a preview of data.

8. Instead of analyzing all of the properties, this demo will only analyze the reviews of one property (listing_id = 66288), so let’s create an intermediary table with just the data we want to further analyze. Click the + button to open a new tab.

Copy and paste the following SQL statement into the query editor. Then click Run.

CREATE OR REPLACE TABLE feedback_analysis.reviews_66288 AS (
SELECT
id,
comments AS text_content
FROM `feedback_analysis.reviews`
WHERE listing_id = 66288 )

The completion of this query creates a new table named reviews_66288 within your feedback_analysis dataset that contains 404 rows of reviews for the specified property. The resulting table also includes only the columns we need for analysis, which are the ID of the review and the review content itself (renamed text_content, as this is the naming convention that the ML.UNDERSTAND_TEXT and ML.TRANSLATE functions require to designate the column containing the target text to analyze).

Create a BigQuery Connection

Creating a BigQuery Connection is a one-time step that you’ll need in order for BigQuery to access the pre-trained models, like Cloud Natural Language, that sit within the Vertex AI product.

  1. Go to the BigQuery page.
  2. Click add Add data, and then click Connections to external data sources.
  3. In the Connection type list, select BigLake and remote functions (Cloud Resource).
  4. In the Connection ID field, enter bq_conn as a name for your connection.
  5. Click Create connection.
  6. Click Go to connection.
  7. In the Connection info pane, copy the service account ID for use in a later step.

8. Head to the IAM page.

9. Click Grant Access and enter the service account ID from step 7 in the New principals box.

10. From the Assign Roles drop-down menu, choose these three roles:

Service Usage Consumer

BigQuery Connection User

Cloud Translation API User

11. Click Save, and your service account now has all the permissions it needs.

Create Natural Language and Translation models in BigQuery

Now you’re ready to use the BigQuery Connection to create remote models in BigQuery, which are direct connections to the pre-trained Natural Language and Translate models in Vertex AI. By creating these remote models, you will then be able to use the ML.TRANSLATE and ML.UNDERSTAND_TEXT functions in your BigQuery queries to access the powerful features of these pre-trained models.

  1. Go back to the BigQuery page.
  2. Click Create SQL query.
  3. Copy and paste the following code block containing two SQL statements into the query editor and click Run.
CREATE OR REPLACE MODEL `feedback_analysis.vertex_nl_model`
REMOTE WITH CONNECTION `us.bq_conn`
OPTIONS (remote_service_type ='cloud_ai_natural_language_v1');

CREATE OR REPLACE MODEL `feedback_analysis.vertex_translate_model`
REMOTE WITH CONNECTION `us.bq_conn`
OPTIONS (remote_service_type ='cloud_ai_translate_v3');

Once the queries complete, you will see two models in your Explorer panel.

Translate text data to English

Some of the reviews from the sample dataset are in languages other than English. As part of my original demo, I first translated all non-English reviews before analyzing them with Cloud Natural Language. You can do this in BigQuery using the ML.TRANSLATE function and referencing the translation model feedback_analysis.vertex_translate_model that you created in the last section.

  1. Click the + button to open a new query editor tab.
  2. Copy and paste the following SQL statement into the query editor and click Run.
CREATE OR REPLACE TABLE `feedback_analysis.reviews_66288_english` AS (
SELECT
id,
STRING(ml_translate_result.translations[0].detected_language_code) AS `Original Language`,
text_content AS `original_text`,
STRING(ml_translate_result.translations[0].translated_text) AS text_content,
ml_translate_status as `Status`
FROM ML.TRANSLATE(
MODEL `feedback_analysis.vertex_translate_model`,
TABLE `feedback_analysis.reviews_66288`,
STRUCT('translate_text' AS translate_mode, 'en' AS target_language_code)));

The query statement results in a new table reviews_66288_english for which the text_content column now contains the review text translated into English. If you are curious, you can learn more about the syntax of this query in the ML.TRANSLATE documentation.

Analyze sentiment of text data

Now it’s time to take the translated text_content, and send it to Cloud Natural Language for analysis.

  1. Click the + button to open a new query editor tab.
  2. Copy and paste the following SQL statement into the query editor and click Run.
CREATE OR REPLACE TABLE feedback_analysis.reviews_66288_results AS (
SELECT * FROM ML.UNDERSTAND_TEXT(
MODEL `feedback_analysis.vertex_nl_model`,
TABLE feedback_analysis.reviews_66288_english,
STRUCT('analyze_entity_sentiment' AS nlu_option)))

This query runs the translated reviews through the Analyze Entity Sentiment feature of Cloud Natural Language and collects the responses into a new table, reviews_66288_results. If you want to learn more about the query syntax, check out the ML.UNDERSTAND_TEXT documentation.

We can see the responses from Cloud Natural Language aren’t quite ready to interpret for insights; the entities and their sentiment scores are nested inside JSON objects in the ml_understand_text_result column.

Format sentiment analysis response

This final step will unpack the entities and their sentiment scores from each JSON response object, and create a new table that has one line per entity mentioned.

  1. Click the + button to open a new query editor tab.
  2. Copy and paste the following SQL statement into the query editor and click Run.
CREATE OR REPLACE TABLE feedback_analysis.reviews_66288_results_entities AS (
WITH json_data AS (
SELECT
id,
JSON_EXTRACT(ml_understand_text_result, '$.entities') AS entities_json,
text_content
FROM
`feedback_analysis.reviews_66288_results`
)
SELECT
id,
CAST(JSON_VALUE(mention.text, '$.content') AS STRING) AS mention_content,
CAST(COALESCE(JSON_VALUE(mention.sentiment, '$.magnitude'), '0') AS FLOAT64) AS sentiment_magnitude,
CAST(COALESCE(JSON_VALUE(mention.sentiment, '$.score'), '0') AS FLOAT64) AS sentiment_score
FROM
json_data,
UNNEST(JSON_EXTRACT_ARRAY(entities_json, '$')) AS entity,
UNNEST(JSON_EXTRACT_ARRAY(entity, '$.mentions')) AS mention
)

Executing this query results in a new table called reviews_66288_results_entities, which you can see now has grown from 404 rows of reviews, to almost 6,000 rows of entities which were contained within those reviews, along with their sentiment scores.

Visualize results for insights

Now let’s take 6,000 rows of data and visualize it to help interpret and understand the results. BigQuery is integrated with a few visualization tools, including connecting with one-click to Looker Studio and Google Sheets. Since my original demo utilized a Google Sheets chart, we’ll follow the same path here.

You can click the Export button and choose Explore with Sheets.

Once the data was connected into Google Sheets, I recommend pulling the full set of data into an extract so that you can work with the full functionality of Sheets (and without incurring any BigQuery costs).

For the below visualization, I further aggregated the data into a pivot table to get an average sentiment score and total number of mentions by entity, and then created a scatter chart to plot each entity on these metrics.

We can immediately see common vacation rental topics such as “location”, “noise”, and “bed” and understand their average sentiment was likely positive (you can read more about interpreting sentiment score values in the documentation).

Next steps

Whew! I know that was quite a walkthrough, but now you know how to use BigQuery with Cloud Natural Language and Cloud Translation to analyze open-ended feedback! You have the setup steps complete, so you can continue on your journey and try out…

Once you’re done, make sure you shut down your Google Cloud project. While the amount of data you have stored should fit within the BigQuery free tier, it is safest to prevent any further charges.

I’d love to build more low-code/no-code, scalable analysis solutions in the future — let me know if there are any particular challenges you are having with this solution or in your own business!

--

--

Alicia Williams
Google Cloud - Community

Google Cloud Developer Relations. Learning how to do cool things with @GoogleCloudTech and @GoogleWorkspace. Opinions are my own, not that of my company.