Analyze Customer Survey Data using Google Forms & Big Query
Google Forms
Google Forms is a survey administration software included in Google Docs editors doc suite along with Google Sheets, Google Docs & Google Slides. It collects information about users through surveys and automatically stored in Google Sheets. You can either create Personal forms or Business (Additional Security features) as per the requirement. One can create beautiful, effortless & polished forms using the below link https://www.google.com/intl/en-GB/forms
Below is the sample Google Form for reference.
In this write up we have not covered Introduction to Big Query as there are many other articles which cover these in detail. To elucidate these two robust products in less than four words, I would add these products as below
BigQuery:- Data warehouse/Analytics.
Let us understand the flow from Google Forms to Big Query, When a Google Survey/Feedback form is created, It automatically stores the responses and the owner who created the form can easily visualize the responses as below. The data gets stored in the Google Drive by default, the location settings and other preferences are provided as per the initial settings.
As we wanted to read data from Google Forms, there is an option to automatically save the responses to Google Sheets . One can easily save up to 2 Million Cells of data in Google Sheets.
Open the Responses button on the Google Forms and choose storage option as Google Sheets. Data gets stored on the Google Sheets
Querying Drive data using Big Query
Big Query supports queries against both personal Drive files and shared files. You can query files in Drive in the following formats:
- Comma-separated values (CSV)
- Newline-delimited JSON
- Avro
- Sheets
In this case study we will use Google Sheets as our source for querying drive data.
To query a Google Sheets source, provide the Drive URI path to your data and create an external table that references that data source. The table used to reference the Drive data source can be a permanent table or a temporary table.
Retrieving the Drive URI
To create an external table for a Sheets data source, you must provide the Sheets URI.
For example, to retrieve the URI for a Sheets file:
- Go to Sheets.
- Open your spreadsheet and copy the URI in the browser’s address bar.
Below is the sample Google Sheets for reference
Enabling Drive access
Accessing data hosted within Drive requires an additional OAuth scope, both when defining the federated source as well as during query execution. While not enabled by default, it can be included in the Cloud Console, in the bq
command-line tool, or via the API via the following mechanisms:
Let us discuss the methods to enable access while using Cloud console
Follow the web-based authentication steps when creating a permanent table in the Cloud Console. When you are prompted, click Allow to give BigQuery Client Tools access to Drive.
Permanent versus temporary external tables
You can query an external data source(Google Sheets)in Big Query by using a permanent table or a temporary table. A permanent table is a table that is created in a dataset and is linked to your external data source. Because the table is permanent, you can use access controls to share the table with others who also have access to the underlying external data source, and you can query the table at any time.
When you query an external data source(Google Sheets) using a temporary table, you submit a command that includes a query and creates a non-permanent table linked to the external data source.
When you use a temporary table, you do not create a table in one of your Big Query datasets. Because the table is not permanently stored in a dataset, it cannot be shared with others. Querying an external data source using a temporary table is useful for one-time, ad-hoc queries over external data, or for extract, transform, and load (ETL) processes.
Querying Google Sheets data using permanent external tables
Required permissions and scopes
When you query external data in Drive (Google Sheets) using a permanent table, you need permissions to run a query job at the project level or higher, you need permissions that allow you to create a table that points to the external data, and you need permissions that allow you to access the table data. When your external data is stored in Drive, you also need permissions to access the Drive file that is linked to your external table.
BigQuery permissions
At a minimum, the following permissions are required to create and query an external table in BigQuery.
bigquery.tables.create
bigquery.tables.getData
bigquery.jobs.create
The following predefined IAM roles include both bigquery.tables.create
and bigquery.tables.getData
permissions:
bigquery.dataEditor
bigquery.dataOwner
bigquery.admin
The following predefined IAM roles include bigquery.jobs.create
permissions:
bigquery.user
bigquery.jobUser
bigquery.admin
In addition, if a user has bigquery.datasets.create
permissions, when that user creates a dataset, they are granted bigquery.dataOwner
access to it. bigquery.dataOwner
access gives the user the ability to create external tables in the dataset, but bigquery.jobs.create
permissions are still required to query the data.
For more information on IAM roles and permissions in BigQuery, see Predefined roles and permissions.
Drive permissions
At a minimum, to query external data in Drive you must be granted View
access to the Drive file linked to the external table.
Creating and querying a permanent external table
You can create a permanent table linked to your Google Sheets data source by
- Using the Cloud Console
- Using the
bq
command-line tool'smk
command - Creating an
ExternalDataConfiguration
when you use thetables.insert
API method - Using the client libraries
To query an external data source using a permanent table, you create a table in a BigQuery dataset that is linked to your external data source. The data is not stored in the BigQuery table. Because the table is permanent, you can use access controls to share the table with others who also have access to the underlying external data source.
To create an external table:
1. Go to the BigQuery page in the cloud console
2. In the navigation panel, in the Resources section, expand your project and select a dataset. Click Create table on the right side of the window
3. On the Create table page, in the Source section:
- For Create table from, select Drive.
- In the Select Drive URI field, enter the Drive URI. Note that wildcards are not supported for Drive URIs.
- For File format, select sheets as the format of your data. Valid formats for Drive data include:
- Provide the Table name, Dataset name , select Auto Detect Schema to select the filed names automatically and look at the below screenshot for reference.
You can then run a query against the table as if it were a native BigQuery table, subject to the limitations on external data sources.
Below is the sample query using BigQuery. As we have this table now in Big query , we can join with any of our existing tables and perform additional analytics on the Customer Survey data.
After your query completes, you can download the results as CSV or JSON, save the results as a table, or save the results to Sheets. See Download, save, and export data for more information.
Querying Drive data using temporary tables
To query an external data source without creating a permanent table, you run a command to combine:
- A table definition file with a query
- An inline schema definition with a query
- A JSON schema definition file with a query
The table definition file or supplied schema is used to create the temporary external table, and the query runs against the temporary external table. Querying an external data source using a temporary table is supported by the bq
command-line tool and the API.
When you use a temporary external table, you do not create a table in one of your BigQuery datasets. Because the table is not permanently stored in a dataset, it cannot be shared with others. Querying an external data source using a temporary table is useful for one-time, ad-hoc queries over external data, or for extract, transform, and load (ETL) processes.
Required permissions
When you query external data in Drive using a temporary table, you need permissions similar to querying data using a permanent table mentioned in the above section
Creating and querying a temporary table
You can create and query a temporary table linked to an external data source by using the bq
command-line tool, the API, or the client libraries.
You query a temporary table linked to an external data source using the bq query
command with the --external_table_definition
flag. When you use the bq
command-line tool to query a temporary table linked to an external data source, you can identify the table's schema using:
- A table definition file (stored on your local machine)
- An inline schema definition
- A JSON schema file (stored on your local machine)
To query a temporary table linked to your external data source using a table definition file, enter the following command.
bq --location=LOCATION query \
--external_table_definition=TABLE::DEFINITION_FILE \
'QUERY'
Where:
LOCATION
is your location. The--location
flag is optional.TABLE
is the name of the temporary table you're creating.DEFINITION_FILE
is the path to the table definition file on your local machine.QUERY
is the query you're submitting to the temporary table.
To query a temporary table linked to your external data source using an inline schema definition, enter the following command.
bq --location=LOCATION query \
--external_table_definition=TABLE::SCHEMA@SOURCE_FORMAT=DRIVE_URI \
'QUERY'
Where:
LOCATION
is your location. The--location
flag is optional.TABLE
is the name of the temporary table you're creating.SCHEMA
is the inline schema definition in the formatFIELD:DATA_TYPE,FIELD:DATA_TYPE
.SOURCE_FORMAT
isCSV
,NEWLINE_DELIMITED_JSON
,AVRO
, orGOOGLE_SHEETS
.DRIVE_URI
is your Drive URI.QUERY
is the query you're submitting to the temporary table.
For example, the following command creates and queries a temporary table named customer_survey_data
linked to a Google Sheets stored in Drive with the following schema definition: Timestamp:TIMESTAMP,Email_address:STRING
.
bq --location=US query --external_table_definition=customer_survey_data::Timestamp:TIMESTAMP,Email_address:STRING@CSV=https://docs.google.com/spreadsheets/d/1YnzgXkMCzqfnJaB1X2qrM6EdTtOONv5yo0r-ITDxLlU 'SELECT Timestamp,Email_address FROM customer_survey_data'
To query a temporary table linked to your external data source using a JSON schema file, enter the following command.
bq --location=LOCATION query \
--external_table_definition=SCHEMA_FILE@SOURCE_FORMT=DRIVE_URI \
'QUERY'
Where: