Analyze Customer Survey Data using Google Forms & Big Query

Naveenkoundinya
8 min readOct 7, 2021

--

Google Forms

Google Forms is a survey administration software included in Google Docs editors doc suite along with Google Sheets, Google Docs & Google Slides. It collects information about users through surveys and automatically stored in Google Sheets. You can either create Personal forms or Business (Additional Security features) as per the requirement. One can create beautiful, effortless & polished forms using the below link https://www.google.com/intl/en-GB/forms

Below is the sample Google Form for reference.

Fig1:https://forms.gle/9GVGupW65yXRfUZS9

In this write up we have not covered Introduction to Big Query as there are many other articles which cover these in detail. To elucidate these two robust products in less than four words, I would add these products as below

BigQuery:- Data warehouse/Analytics.

Let us understand the flow from Google Forms to Big Query, When a Google Survey/Feedback form is created, It automatically stores the responses and the owner who created the form can easily visualize the responses as below. The data gets stored in the Google Drive by default, the location settings and other preferences are provided as per the initial settings.

As we wanted to read data from Google Forms, there is an option to automatically save the responses to Google Sheets . One can easily save up to 2 Million Cells of data in Google Sheets.

Open the Responses button on the Google Forms and choose storage option as Google Sheets. Data gets stored on the Google Sheets

Querying Drive data using Big Query

Big Query supports queries against both personal Drive files and shared files. You can query files in Drive in the following formats:

  • Comma-separated values (CSV)
  • Newline-delimited JSON
  • Avro
  • Sheets

In this case study we will use Google Sheets as our source for querying drive data.

To query a Google Sheets source, provide the Drive URI path to your data and create an external table that references that data source. The table used to reference the Drive data source can be a permanent table or a temporary table.

Retrieving the Drive URI

To create an external table for a Sheets data source, you must provide the Sheets URI.

For example, to retrieve the URI for a Sheets file:

  1. Go to Sheets.
  2. Open your spreadsheet and copy the URI in the browser’s address bar.

Below is the sample Google Sheets for reference

Fig2. https://docs.google.com/spreadsheets/d/1YnzgXkMCzqfnJaB1X2qrM6EdTtOONv5yo0r-ITDxLlU

Enabling Drive access

Accessing data hosted within Drive requires an additional OAuth scope, both when defining the federated source as well as during query execution. While not enabled by default, it can be included in the Cloud Console, in the bq command-line tool, or via the API via the following mechanisms:

Let us discuss the methods to enable access while using Cloud console

Follow the web-based authentication steps when creating a permanent table in the Cloud Console. When you are prompted, click Allow to give BigQuery Client Tools access to Drive.

Permanent versus temporary external tables

You can query an external data source(Google Sheets)in Big Query by using a permanent table or a temporary table. A permanent table is a table that is created in a dataset and is linked to your external data source. Because the table is permanent, you can use access controls to share the table with others who also have access to the underlying external data source, and you can query the table at any time.

When you query an external data source(Google Sheets) using a temporary table, you submit a command that includes a query and creates a non-permanent table linked to the external data source.

When you use a temporary table, you do not create a table in one of your Big Query datasets. Because the table is not permanently stored in a dataset, it cannot be shared with others. Querying an external data source using a temporary table is useful for one-time, ad-hoc queries over external data, or for extract, transform, and load (ETL) processes.

Querying Google Sheets data using permanent external tables

Required permissions and scopes

When you query external data in Drive (Google Sheets) using a permanent table, you need permissions to run a query job at the project level or higher, you need permissions that allow you to create a table that points to the external data, and you need permissions that allow you to access the table data. When your external data is stored in Drive, you also need permissions to access the Drive file that is linked to your external table.

BigQuery permissions

At a minimum, the following permissions are required to create and query an external table in BigQuery.

  • bigquery.tables.create
  • bigquery.tables.getData
  • bigquery.jobs.create

The following predefined IAM roles include both bigquery.tables.create and bigquery.tables.getData permissions:

  • bigquery.dataEditor
  • bigquery.dataOwner
  • bigquery.admin

The following predefined IAM roles include bigquery.jobs.create permissions:

  • bigquery.user
  • bigquery.jobUser
  • bigquery.admin

In addition, if a user has bigquery.datasets.create permissions, when that user creates a dataset, they are granted bigquery.dataOwner access to it. bigquery.dataOwner access gives the user the ability to create external tables in the dataset, but bigquery.jobs.create permissions are still required to query the data.

For more information on IAM roles and permissions in BigQuery, see Predefined roles and permissions.

Drive permissions

At a minimum, to query external data in Drive you must be granted View access to the Drive file linked to the external table.

Creating and querying a permanent external table

You can create a permanent table linked to your Google Sheets data source by

To query an external data source using a permanent table, you create a table in a BigQuery dataset that is linked to your external data source. The data is not stored in the BigQuery table. Because the table is permanent, you can use access controls to share the table with others who also have access to the underlying external data source.

To create an external table:
1. Go to the BigQuery page in the cloud console

2. In the navigation panel, in the Resources section, expand your project and select a dataset. Click Create table on the right side of the window

Fig3a. Table Creation using BigQuery Console

3. On the Create table page, in the Source section:

  • For Create table from, select Drive.
  • In the Select Drive URI field, enter the Drive URI. Note that wildcards are not supported for Drive URIs.
  • For File format, select sheets as the format of your data. Valid formats for Drive data include:
  • Provide the Table name, Dataset name , select Auto Detect Schema to select the filed names automatically and look at the below screenshot for reference.
Fig 3b. Table Creation using BigQuery Console

You can then run a query against the table as if it were a native BigQuery table, subject to the limitations on external data sources.

Below is the sample query using BigQuery. As we have this table now in Big query , we can join with any of our existing tables and perform additional analytics on the Customer Survey data.

Fig 4. Querying Google Sheets data using permanent table

After your query completes, you can download the results as CSV or JSON, save the results as a table, or save the results to Sheets. See Download, save, and export data for more information.

Querying Drive data using temporary tables

To query an external data source without creating a permanent table, you run a command to combine:

  • A table definition file with a query
  • An inline schema definition with a query
  • A JSON schema definition file with a query

The table definition file or supplied schema is used to create the temporary external table, and the query runs against the temporary external table. Querying an external data source using a temporary table is supported by the bq command-line tool and the API.

When you use a temporary external table, you do not create a table in one of your BigQuery datasets. Because the table is not permanently stored in a dataset, it cannot be shared with others. Querying an external data source using a temporary table is useful for one-time, ad-hoc queries over external data, or for extract, transform, and load (ETL) processes.

Required permissions

When you query external data in Drive using a temporary table, you need permissions similar to querying data using a permanent table mentioned in the above section

Creating and querying a temporary table

You can create and query a temporary table linked to an external data source by using the bq command-line tool, the API, or the client libraries.

You query a temporary table linked to an external data source using the bq query command with the --external_table_definition flag. When you use the bq command-line tool to query a temporary table linked to an external data source, you can identify the table's schema using:

  • A table definition file (stored on your local machine)
  • An inline schema definition
  • A JSON schema file (stored on your local machine)

To query a temporary table linked to your external data source using a table definition file, enter the following command.

bq --location=LOCATION query \
--external_table_definition=TABLE::DEFINITION_FILE \
'QUERY'

Where:

  • LOCATION is your location. The --location flag is optional.
  • TABLE is the name of the temporary table you're creating.
  • DEFINITION_FILE is the path to the table definition file on your local machine.
  • QUERY is the query you're submitting to the temporary table.

To query a temporary table linked to your external data source using an inline schema definition, enter the following command.

bq --location=LOCATION query \
--external_table_definition=TABLE::SCHEMA@SOURCE_FORMAT=DRIVE_URI \
'QUERY'

Where:

  • LOCATION is your location. The --location flag is optional.
  • TABLE is the name of the temporary table you're creating.
  • SCHEMA is the inline schema definition in the format FIELD:DATA_TYPE,FIELD:DATA_TYPE.
  • SOURCE_FORMAT is CSV, NEWLINE_DELIMITED_JSON, AVRO, or GOOGLE_SHEETS.
  • DRIVE_URI is your Drive URI.
  • QUERY is the query you're submitting to the temporary table.

For example, the following command creates and queries a temporary table named customer_survey_data linked to a Google Sheets stored in Drive with the following schema definition: Timestamp:TIMESTAMP,Email_address:STRING.

bq --location=US query --external_table_definition=customer_survey_data::Timestamp:TIMESTAMP,Email_address:STRING@CSV=https://docs.google.com/spreadsheets/d/1YnzgXkMCzqfnJaB1X2qrM6EdTtOONv5yo0r-ITDxLlU 'SELECT Timestamp,Email_address FROM customer_survey_data'

To query a temporary table linked to your external data source using a JSON schema file, enter the following command.

bq --location=LOCATION query \
--external_table_definition=SCHEMA_FILE@SOURCE_FORMT=DRIVE_URI \
'QUERY'

Where:

  • LOCATION is your location. The --location flag is optional.
  • SCHEMA_FILE is the path to the JSON schema file on your local machine.
  • SOURCE_FILE is CSV, NEWLINE_DELIMITED_JSON, AVRO, or GOOGLE_SHEETS.
  • DRIVE_URI is your Drive URI.
  • QUERY is the query you're submitting to the temporary table.

--

--