Setting Up Looker’s Semantic Search Block (In-Depth Tutorial)

Alice Bui
Joon Solutions Global
6 min readJun 6, 2024

The Looker BigQuery Semantic Search block is a new feature that Looker has added to its open-source Looker x GenAI solution. It provides real-time & customizable search capability, enabling fast analysis and decision-making.

In this blog, I will give detailed guidance to set up the feature and build a semantic search model in the Looker instance.

I. Setting Up

1. Enable APIs in Google Cloud Project

Enable the following APIs:

  • BigQuery
  • BigQuery Connection
  • Vertex AI

Option 1: In the Google Cloud Project, Go to APIs & Services > Enabled APIs & Services > + Enable APIs and Services

Option 2: In Cloud Shell, run the following commands

gcloud services enable bigquery.googleapis.com bigqueryconnection.googleapis.com aiplatform.googleapis.com

2. Create a remote connection to Vertex AI in BigQuery

2.1. Required permissions

Ask your administrator to grant you the following IAM roles:

To create a connection:

  • roles/bigquery.connectionAdmin

To grant permissions to the connection’s service account:

  • resourcemanager.projects.setIamPolicy

To create the model using BigQuery ML, you need the following IAM permissions:

  • bigquery.jobs.create
  • bigquery.models.create
  • bigquery.models.getData
  • bigquery.models.updateData
  • bigquery.models.updateMetadata

Ref: Generate text by using the ML.GENERATE_TEXT function | BigQuery | Google Cloud

2.2. Create a remote connection

In Cloud Shell or a command line environment authenticated to gcloud run the following commands:

  • Export environment variables:
export REGION=&&
export PROJECT_ID=&&
export CONNECTION_ID=&&

Replace && the following:

REGION: your connection region, e.g. us

PROJECT_ID: your Google Cloud project ID, e.g. joon-sandbox

CONNECTION_ID: an ID for your connection, e.g. bqml_semantic_search_block

With the values above, I will have a command as below:

export REGION=us
export PROJECT_ID=joon-sandbox
export CONNECTION_ID=bqml_semantic_search_block
  • Create a connection:
bq mk --connection --location=REGION --project_id=PROJECT_ID \
--connection_type=CLOUD_RESOURCE CONNECTION_ID

When you create a connection resource, BigQuery creates a unique system service account and associates it with the connection.

  • Retrieve and copy the service account ID for use in a later step:
bq show --connection $PROJECT_ID.$REGION.$CONNECTION_ID

The output is similar to the following:

Connection joon-sandbox.US.bqml_semantic_search_block

name friendlyName description Last modified type hasCredential properties
-------------------------------------------- -------------- ------------- ----------------- ---------------- --------------- -----------------------------------------------------------------------------------------------
136420034762.us.bqml_semantic_search_block 09 May 08:42:18 CLOUD_RESOURCE False {"serviceAccountId": "bqcx-136420034762-2pvn@gcp-sa-bigquery-condel.iam.gserviceaccount.com"}
  • Grant service account the Vertex AI User, BigQuery Connection User role:

Replace MEMBER with Service account ID (member='serviceAccount:serviceAccount:bqcx-136420034762-zru6@gcp-sa-bigquery-condel.iam.gserviceaccount.com') and run the following commands:

gcloud projects add-iam-policy-binding '$PROJECT_NUMBER' --member='serviceAccount:MEMBER' --role='roles/aiplatform.user'
gcloud projects add-iam-policy-binding '$PROJECT_NUMBER' --member='serviceAccount:MEMBER' --role='roles/bigquery.connectionUser'

3. Create a BQML model in BigQuery

From your BigQuery SQL Editor Run the following to provision the BQML Model. To note, please replace the PROJECT_ID, DATASET_ID, MODEL_NAME, REGION, & CONNECTION_ID with those of your environment.

 CREATE OR REPLACE MODEL
`PROJECT_ID.DATASET_ID.MODEL_NAME`
REMOTE WITH CONNECTION `PROJECT_ID.REGION.CONNECTION_ID`
OPTIONS (ENDPOINT = 'gemini-pro');

Replace the following:

  • PROJECT_ID: your project ID, e.g. joon-sandbox
  • DATASET_ID: the ID of the dataset to contain the model, e.g. looker_genai
  • MODEL_NAME: the name of the model, e.g. bqml_semantic_search_block
  • REGION: the region used by the connection, e.g. us
  • CONNECTION_ID: the ID of your BigQuery connection (projects/myproject/locations/connection_location/connections/myconnection), e.g. projects/joon-sandbox/locations/us/connections/bqml_semantic_search_block
  • ENDPOINT: the name of the supported Vertex AI model to use. e.g. ENDPOINT='gemini-pro'

4. Create a database connection in Looker instance

  • In the Looker instance, Go to Admin > Connections in Database section, and create a connection. For example:
  • Download the service account key and upload the Service Account file
  • Enable PDTs in Optional Settings:
  • Add the connection to the Looker model file:

II. Building Model

1. Data modelling

To have the insights mentioned in my previous blog HERE, you can navigate and clone the Github Repo: joon-solutions/bqml_semantic_search_block

In the project, we have the following models, in which:

  • View with the business data: products, order_items, users
  • View with semantic search model embedded: product_semantic_search

2. Understanding Semantic Search view

2.1. What it does?

This Looker view utilizes BigQuery ML functions & Vertex AI ML models to define a semantic product search based on product data. Here’s a breakdown of the process:

  • Step 1: Connect to the foundational Embeddings Model on Vertex AI using the remote connection established earlier (product_embeddings_model)
  • Step 2: Create a table to hold our product embeddings. An embedding is a compressed representation of the product data captured by the BQML model. It’s like a unique fingerprint for each product based on its text information (product_embeddings)
  • Step 3: Create an index on the embedding data, which speeds up searching by allowing BigQuery to quickly find similar embeddings (product_embeddings_index)
  • Step 4: Perform the vector search and return matched values (product_semantic_search)

2.2. How can you replicate the model?

  • Configure @{BQML_REMOTE_CONNECTION_ID} in the manifest.lkml file
  • Replace products view name in product_embeddings (line 12, 13) with your desired view name.
view: product_embeddings {
derived_table: {
datagroup_trigger: ecomm_daily
publish_as_db_view: yes
sql_create:
-- This SQL statement creates embeddings for all the rows in the given table (in this case the products lookml view) --
CREATE OR REPLACE TABLE ${SQL_TABLE_NAME} AS
SELECT ml_generate_embedding_result as text_embedding
, * FROM ML.GENERATE_EMBEDDING(
MODEL ${product_embeddings_model.SQL_TABLE_NAME},
(
SELECT *, name as content
FROM ${products.SQL_TABLE_NAME}
)
)
WHERE LENGTH(ml_generate_embedding_status) = 0; ;;
}
}
  • Replace selected columns in SQL statement of product_semantic_search
view: product_semantic_search {
derived_table: {
sql:
-- This SQL statement performs the vector search --
-- Step 1. Generate Embedding from natural language question --
-- Step 2. Specify the text_embedding column from the embeddings table that was generated for each product in this example --
-- Step 3. Use BQML's native Vector Search functionality to match the nearest embeddings --
-- Step 4. Return the matche products --
SELECT query.query
,base.name as matched_product
,base.id as matched_product_id
,base.sku as matched_product_sku
,base.category as matched_product_category
,base.brand as matched_product_brand
, 1
FROM VECTOR_SEARCH(
TABLE ${product_embeddings.SQL_TABLE_NAME}, 'text_embedding',
(
SELECT ml_generate_embedding_result, content AS query
FROM ML.GENERATE_EMBEDDING(
MODEL ${product_embeddings_model.SQL_TABLE_NAME},
(SELECT {% parameter product_description %} AS content)
)
),
top_k => {% parameter product_matches %}
,options => '{"fraction_lists_to_search": 0.5}'
)
;;
}

2.3. What should you pay attention to?

  • The quality of the search results depends on the quality of the product data. So make sure product names are clean and relevant enough.
  • The model can only run successfully when the following filters are applied

The Product Description filter allows us to search for products matching a given description, all done through the Looker Frontend without hardcoding a variable or running the raw SQL itself.

Similarly, Product Matches allows an end user to specify how many recommendations/matches they want to be returned.

If you have any questions related to this setup guide or want to know how to fit this into your organization’s use cases, please leave your contact info HERE and we will get back to you right away!

Besides this, we also offer other extensions in the Looker x GenAI package, which are Data Actions, Explore Assistant and Dashboard Summary.

You can also read our other blogs on Medium and visit our website to know more about us.

--

--

Alice Bui
Joon Solutions Global

Analytics Engineer @ Joon Solutions | dbt, Looker, Airflow Certified