Trying out Data QnA on BigQuery and Google Sheets

Watch as my natural language queries get translated into SQL

Lak Lakshmanan
DAAS Labs

--

The BigQuery team dropped a surprise — natural language querying in BigQuery! Because it’s still in private alpha, you may not be able to try it out yourself. So … next best thing, watch me try it out:

How to use Data QnA

This is what I did:

  • Enable the Data QnA feature (I didn’t show you this, but it’s from the IAM console, similar to any other Google Cloud feature)
  • If the table you want to query is already in your own project, you are done. Otherwise, create a view in your project to reflect the table you want to query.
  • Go to the Data QnA part of the console and create a table.
  • Correct metrics/dimensions if needed (I didn’t need to)
  • Add synonyms (I added cases and county instead of confirmed_cases and county_name)
  • Click Save
  • On the BigQuery console, click “Ask a question”
  • Ask a question, generate sql, run the sql. Try a few natural language queries (see below).
  • Open sheets.new
  • Connect to the BigQuery table through the Data | Data Connectors menu
  • Ask a question or two in sheets.

Data QnA is in alpha, so if you want to try it out, ask your friendly Google Cloud rep to get your project whitelisted.

Autogenerated SQL

Data QnA takes natural language text and autogenerates the SQL. Here are a few examples on the Covid dataset. In each case, I am listing the natural language question and the autogenerated SQL.

Q: Which county has most number of cases

SELECT
county_name AS county_name,
(SUM(confirmed_cases)) AS SUM_confirmed_cases
FROM
`ai-analytics-solutions.advdata.covid19_usafacts`
GROUP BY county_name
ORDER BY SUM_confirmed_cases DESC
LIMIT 10;

Q: how many cases in kansas in March

SELECT
(SUM(confirmed_cases)) AS SUM_confirmed_cases
FROM
`ai-analytics-solutions.advdata.covid19_usafacts`
WHERE
state = 'KS' AND (date BETWEEN '2020-03-01' AND '2020-03-31');

Q: which state and county had most cases in last 7 days

SELECT
state AS state,
county_name AS county_name,
(SUM(confirmed_cases)) AS SUM_confirmed_cases
FROM
`ai-analytics-solutions.advdata.covid19_usafacts`
WHERE
(date BETWEEN '2020-07-03' AND '2020-07-09')
GROUP BY state, county_name
ORDER BY SUM_confirmed_cases DESC
LIMIT 10;

Q: how many counties are named Washington

SELECT
(COUNT(*)) AS COUNT__ROWS_
FROM
`ai-analytics-solutions.advdata.covid19_usafacts`
WHERE
county_name = 'Washington';

Q: which state’s washington county had least number of cases

SELECT
state AS state,
(SUM(confirmed_cases)) AS SUM_confirmed_cases
FROM
`ai-analytics-solutions.advdata.covid19_usafacts`
WHERE
county_name = 'Washington County'
GROUP BY state
ORDER BY SUM_confirmed_cases
LIMIT 10;

Q: cases by month in Kansas

SELECT
TIMESTAMP(FORMAT_DATE('%Y-%m-%d', DATE_ADD(DATE_ADD(DATE_TRUNC(SAFE_CAST(date AS DATE), MONTH), INTERVAL 1 MONTH),
INTERVAL - 1 DAY))) AS date,
(SUM(confirmed_cases)) AS SUM_confirmed_cases
FROM
`ai-analytics-solutions.advdata.covid19_usafacts`
WHERE
state = 'KS'
GROUP BY date
ORDER BY date;

Enjoy!

--

--

Lak Lakshmanan
DAAS Labs

articles are personal observations and not investment advice.