Write SQL with natural language using Vertex AI and BigQuery

Israel Herraiz
Google Cloud - Community
2 min readNov 26, 2023

--

Picture from devfest, authored by the GDG Cloud Madrid (https://twitter.com/gdgcloudmadrid/status/1728392858778898587)

The last weekend I had the opportunity to speak at the devfest of the GDG Cloud Madrid, presenting an example of how query a dataset in BigQuery using natural language. This post shares some details about the workshop and all the materials to replicate it or run a demo without having to write any code.

The example is using langchain, PaLM and Codey, and Vertex AI embeddings, to get a question from the user, transform it into a SQL query, run it in BigQuery, get the result in CSV, and interpret all that information to provide an answer to the question. The coordination of all the activities is done with a langchain Agent, that is also using a mathematical tool to be able to do some calculations with the results.

The prompts are using the schema of the tables in the dataset as context. These schemas are converted into embeddings using Vertex AI and ingested into a local ChromaDb. Each table's schema is a single document, and only one document is ingested into the prompt.

The agent is using some memory and the prompts are prepared to get that memory ingested into the prompt, so you can make follow-up questions, and the agent will remember the context of the previous question to provide new answers.

The BigQuery public dataset used in this example contains two tables, with very detailed annotations, that really helps the model in the generation of correct queries. If you want to use any other table or dataset, I strongly recommend adding annotations to the schema, to improve the quality of the generated queries.

What do you need to run this workshop? You will need a Google Cloud project with Vertex AI and BigQuery enabled, and permissions to use the PaLM, Codey and embedding APIs, as well as to run BigQuery queries. The cost of running the example and a couple of question is <0.10 USD, so you can use the free credits offered for new accounts to run the notebooks.

If you want to replicate this workshop, you can use the following materials:

--

--