BigQuery Breaking News — Generative AI

Artem Nikulchenko
Google Cloud - Community
4 min readJun 18, 2023

--

In the past several months, the attention of the whole world (or, at least, the tech world) has been attracted to Generative AI models (and LLM specifically). Our GDE and Cloud Camption communities are no exception, with us playing with new GCP Vertex AI models and getting ready to share the details. For me, it even feels like I was spending more time in Vertex AI recently than in my favorite GCP part — BigQuery. A list of wonderful BigQuery topics that I need to cover (including such interesting things as fail-safe, time travel, safe operations, clones, and many more) are looking at me from the page of my notepad as a silent confirmation of that.

That is why today’s news is worth sharing! Text-bison model (LLM) from Vertex AI is now available directly in BigQuery via SQL (link).

Let’s recap what BigQuery had before in connection to ML. Since this was not a planned post, this would be just a short overview.

BigQuery ML trained models

First of all, BigQuery has many embedded models for such workflows as Regression, Classification, Clustering, Collaborative Filtering, Dimensionality Reduction and Time-series forecasting.

You have following models embedded into BigQuery:

Supervised Learning

Unsupervised Learning

Special cases

Note: Why time-series forecasting, anomaly detection, and recommendations are special cases? The way they are currently built in BigQuery, you would need to use special functions ML.FORECAST, ML.DETECT_ANOMALIES and ML.RECOMMEND correspondingly instead of standard for other model ML.PREDICT.

If your use-case requires one of those models — then only using SQL statements and without leaving the BigQuery console, you can get your model trained and then get real-time inferences from it.

BigQuery imported models

If BigQuery embedded models or Vertex AI APIs do not cover your use-case, another option is to use imported models. In this case, you would need to train your model outside of BigQuery and then import the trained model into BigQuery. After that, inside BigQuery, you would be able to use your model in SQL statements for inference.

BigQuery ML supports the following types of imported models:

BigQuery remote models

But what if your model was trained and deployed using Vertex AI. Since both products are part of GCP, it feels redundant to export this model from Vertex AI and import it into BigQuery (and don’t forget that if your model is re-trained later — you will need to refresh it in BigQuery). Likely, you actually don’t need to do that…

You can access Vertex AI deployed models inside BigQuery using remote models. That is also a great option in case you want to use GPU hardware for your model (which is available in Vertex AI) and if you in general need more control over your model.

Have we covered all the options now? Not yet…

Vertex AI API-based models

As you know, Vertex AI is like an umbrella on top of several services. Yes, that are services for training and deploying any of your custom models. But there are also standard APIs for use cases that are near and dear to Google, including:

  • Cloud Natural Language API
  • Cloud Translation API
  • Cloud Vision API

Under those APIs are sophisticated models that Google trained using lots of data and exposed to us as APIs. Those APIs can be accessed directly in BigQuery using remote models and special functions:

With that covered, we now are ready for the latest addition to that family — Generative AI.

Generative AI

Now Vertex AI text-bison natural language foundation model is added to the list. You can use it directly inside BigQuery for the following use cases:

  • Classification
  • Sentiment Analysis
  • Entity extraction
  • Extractive Question Answering
  • Summarization
  • Rewriting text in a different style
  • Ad copy generation
  • Concept ideation

After you create a model (similar process as for creating remote models for Vertex AI API-based models) you can use special function ML.GENERATE_TEXT to perform generative natural language tasks on a text data that’s stored in BigQuery tables.

Two tutorials are available to get you started: using public dataset and your own data.

Note: The feature is still in preview and requires enrolment. It is also not clear if it would be possible later to use fine-tuned models. But it is a great step!

--

--

Artem Nikulchenko
Google Cloud - Community

Chief Software Architect with 10+ year of experience, PhD, Associate Professor, IT Univer co-organizer, GDG Organizer