Setting Up Looker’s Semantic Search Block (In-Depth Tutorial)
The Looker BigQuery Semantic Search block is a new feature that Looker has added to its open-source Looker x GenAI solution. It provides real-time & customizable search capability, enabling fast analysis and decision-making.
In this blog, I will give detailed guidance to set up the feature and build a semantic search model in the Looker instance.
I. Setting Up
1. Enable APIs in Google Cloud Project
Enable the following APIs:
- BigQuery
- BigQuery Connection
- Vertex AI
Option 1: In the Google Cloud Project, Go to APIs & Services
> Enabled APIs & Services
> + Enable APIs and Services
Option 2: In Cloud Shell, run the following commands
gcloud services enable bigquery.googleapis.com bigqueryconnection.googleapis.com aiplatform.googleapis.com
2. Create a remote connection to Vertex AI in BigQuery
2.1. Required permissions
Ask your administrator to grant you the following IAM roles:
To create a connection:
roles/bigquery.connectionAdmin
To grant permissions to the connection’s service account:
resourcemanager.projects.setIamPolicy
To create the model using BigQuery ML, you need the following IAM permissions:
bigquery.jobs.create
bigquery.models.create
bigquery.models.getData
bigquery.models.updateData
bigquery.models.updateMetadata
Ref: Generate text by using the ML.GENERATE_TEXT function | BigQuery | Google Cloud
2.2. Create a remote connection
In Cloud Shell or a command line environment authenticated to gcloud
run the following commands:
- Export environment variables:
export REGION=&&
export PROJECT_ID=&&
export CONNECTION_ID=&&
Replace && the following:
REGION
: your connection region, e.g. us
PROJECT_ID
: your Google Cloud project ID, e.g. joon-sandbox
CONNECTION_ID
: an ID for your connection, e.g. bqml_semantic_search_block
With the values above, I will have a command as below:
export REGION=us
export PROJECT_ID=joon-sandbox
export CONNECTION_ID=bqml_semantic_search_block
- Create a connection:
bq mk --connection --location=REGION --project_id=PROJECT_ID \
--connection_type=CLOUD_RESOURCE CONNECTION_ID
When you create a connection resource, BigQuery creates a unique system service account and associates it with the connection.
- Retrieve and copy the service account ID for use in a later step:
bq show --connection $PROJECT_ID.$REGION.$CONNECTION_ID
The output is similar to the following:
Connection joon-sandbox.US.bqml_semantic_search_block
name friendlyName description Last modified type hasCredential properties
-------------------------------------------- -------------- ------------- ----------------- ---------------- --------------- -----------------------------------------------------------------------------------------------
136420034762.us.bqml_semantic_search_block 09 May 08:42:18 CLOUD_RESOURCE False {"serviceAccountId": "bqcx-136420034762-2pvn@gcp-sa-bigquery-condel.iam.gserviceaccount.com"}
- Grant service account the Vertex AI User, BigQuery Connection User role:
Replace MEMBER
with Service account ID (member='serviceAccount:serviceAccount:bqcx-136420034762-zru6@gcp-sa-bigquery-condel.iam.gserviceaccount.com'
) and run the following commands:
gcloud projects add-iam-policy-binding '$PROJECT_NUMBER' --member='serviceAccount:MEMBER' --role='roles/aiplatform.user'
gcloud projects add-iam-policy-binding '$PROJECT_NUMBER' --member='serviceAccount:MEMBER' --role='roles/bigquery.connectionUser'
3. Create a BQML model in BigQuery
From your BigQuery SQL Editor Run the following to provision the BQML Model. To note, please replace the PROJECT_ID
, DATASET_ID
, MODEL_NAME
, REGION
, & CONNECTION_ID
with those of your environment.
CREATE OR REPLACE MODEL
`PROJECT_ID.DATASET_ID.MODEL_NAME`
REMOTE WITH CONNECTION `PROJECT_ID.REGION.CONNECTION_ID`
OPTIONS (ENDPOINT = 'gemini-pro');
Replace the following:
PROJECT_ID
: your project ID, e.g.joon-sandbox
DATASET_ID
: the ID of the dataset to contain the model, e.g.looker_genai
MODEL_NAME
: the name of the model, e.g.bqml_semantic_search_block
REGION
: the region used by the connection, e.g.us
CONNECTION_ID
: the ID of your BigQuery connection (projects/myproject/locations/connection_location/connections/myconnection
), e.g.projects/joon-sandbox/locations/us/connections/bqml_semantic_search_block
ENDPOINT
: the name of the supported Vertex AI model to use. e.g.ENDPOINT='gemini-pro'
4. Create a database connection in Looker instance
- In the Looker instance, Go to
Admin
>Connections
inDatabase
section, and create a connection. For example:
- Download the service account key and upload the
Service Account file
- Enable PDTs in
Optional Settings
:
- Add the connection to the Looker model file:
II. Building Model
1. Data modelling
To have the insights mentioned in my previous blog HERE, you can navigate and clone the Github Repo: joon-solutions/bqml_semantic_search_block
In the project, we have the following models, in which:
- View with the business data:
products, order_items, users
- View with semantic search model embedded:
product_semantic_search
2. Understanding Semantic Search view
2.1. What it does?
This Looker view utilizes BigQuery ML functions & Vertex AI ML models to define a semantic product search based on product data. Here’s a breakdown of the process:
- Step 1: Connect to the foundational Embeddings Model on Vertex AI using the remote connection established earlier (
product_embeddings_model
) - Step 2: Create a table to hold our product embeddings. An embedding is a compressed representation of the product data captured by the BQML model. It’s like a unique fingerprint for each product based on its text information (
product_embeddings
) - Step 3: Create an index on the embedding data, which speeds up searching by allowing BigQuery to quickly find similar embeddings (
product_embeddings_index
) - Step 4: Perform the vector search and return matched values (
product_semantic_search
)
2.2. How can you replicate the model?
- Configure
@{BQML_REMOTE_CONNECTION_ID}
in themanifest.lkml
file - Replace
products
view name inproduct_embeddings
(line 12, 13) with your desired view name.
view: product_embeddings {
derived_table: {
datagroup_trigger: ecomm_daily
publish_as_db_view: yes
sql_create:
-- This SQL statement creates embeddings for all the rows in the given table (in this case the products lookml view) --
CREATE OR REPLACE TABLE ${SQL_TABLE_NAME} AS
SELECT ml_generate_embedding_result as text_embedding
, * FROM ML.GENERATE_EMBEDDING(
MODEL ${product_embeddings_model.SQL_TABLE_NAME},
(
SELECT *, name as content
FROM ${products.SQL_TABLE_NAME}
)
)
WHERE LENGTH(ml_generate_embedding_status) = 0; ;;
}
}
- Replace selected columns in SQL statement of
product_semantic_search
view: product_semantic_search {
derived_table: {
sql:
-- This SQL statement performs the vector search --
-- Step 1. Generate Embedding from natural language question --
-- Step 2. Specify the text_embedding column from the embeddings table that was generated for each product in this example --
-- Step 3. Use BQML's native Vector Search functionality to match the nearest embeddings --
-- Step 4. Return the matche products --
SELECT query.query
,base.name as matched_product
,base.id as matched_product_id
,base.sku as matched_product_sku
,base.category as matched_product_category
,base.brand as matched_product_brand
, 1
FROM VECTOR_SEARCH(
TABLE ${product_embeddings.SQL_TABLE_NAME}, 'text_embedding',
(
SELECT ml_generate_embedding_result, content AS query
FROM ML.GENERATE_EMBEDDING(
MODEL ${product_embeddings_model.SQL_TABLE_NAME},
(SELECT {% parameter product_description %} AS content)
)
),
top_k => {% parameter product_matches %}
,options => '{"fraction_lists_to_search": 0.5}'
)
;;
}
2.3. What should you pay attention to?
- The quality of the search results depends on the quality of the product data. So make sure product names are clean and relevant enough.
- The model can only run successfully when the following filters are applied
The Product Description filter allows us to search for products matching a given description, all done through the Looker Frontend without hardcoding a variable or running the raw SQL itself.
Similarly, Product Matches allows an end user to specify how many recommendations/matches they want to be returned.
If you have any questions related to this setup guide or want to know how to fit this into your organization’s use cases, please leave your contact info HERE and we will get back to you right away!
Besides this, we also offer other extensions in the Looker x GenAI package, which are Data Actions, Explore Assistant and Dashboard Summary.
You can also read our other blogs on Medium and visit our website to know more about us.