How GenAI saved days of development — an SQL query parser for BigQuery usage statistics

Yunus Durmuş
Google Cloud - Community

--

If you have a data hub where you match data producers and consumers, wouldn’t be awesome to tell producers what are the most accessed parts of their data? Then maybe you recommend them what to share more by comparing producers to each other?

That was a recent project that I worked on. During a coffee chat, my colleague told me about his new project-benefits of returning to office ;). They needed to create a data sharing hub on BigQuery with all the security features like row and column based access control. But as a side feature, they also wanted to see who reads which columns and with which filters. Then they would be able to tell the data publishers the most accessed pieces of their data.

Who has all the time to learn Google SQL parser just for a prototype?

We brainstormed briefly about Google SQL parser, it does the job but it takes time to learn it. Moreover, the SQL statements vary a lot. So it might take days to come up with a working prototype. Then of course we considered the buzz of the year, GenAI, to parse the SQL!

30 minute prompting and 2 hours BigQuery ML — Done!

After the coffee chat, I spent 30 minutes to try a few prompts, and it worked! Then the next steps were:

  • Collecting all the SQL queries that consumers run
  • Parsing all of them with GenAI
  • Persisting the results into another table where we can use Looker to create a dashboard
  • Finally, scheduling all the above steps for regularly parsing new consumer queries.

In 2 hours, I finished it. I used:

Below is the code:

Ok, what is the catch?

GenAI does hallucinate and can miss sometimes. On top of that, unlike the Google SQL library, it doesn’t generate alerts when it can’t parse. It just hallucinates. So if you want to have exact statistics on data usage, for instance to charge your data consumers, then GenAI is not the solution to depend on. But in our case, our customer is fine with some approximate stats. Based on the demand for this feature in the future, we can go back and spend days to implement a better solution.

So, that was the most productive day for me this year!

--

--