Instant IPL Facts with AlloyDB Omni: Bringing Data and Insights Closer with Natural Language to SQL.
With the rapid advancements in Large Language Models (LLMs), the way technology is transforming how we accomplish tasks is nothing short of extraordinary! The idea that we can bring data closer to users through Natural Language to SQL, powered by foundational LLMs, is not only innovative but incredibly exciting.
As we know, one of the things we Indians love is cricket. So, I thought, why not merge this passion with the power of natural language queries and databases? This led me to create an IPL Trivia that taps into database tables and fetches interesting facts based on user input. While I can’t promise it’ll handle every query perfectly, I’m thrilled by how this new approach makes data more accessible. It has the potential to revolutionize how we traditionally retrieve insights, moving beyond additional data access layers or ORMs.
Let’s get started and see it actions.!
Step 1— Generate Service Account to integrate Vertex AI and AlloyDB Omni
To run AlloyDB Omni within Docker, we need to load the private JSON key with necessary permission that enables calls to foundational or third-party models through Vertex AI.
Replace <<project_name>>
with your GCP project name:
gcloud config set project <<project_name>>
export PROJECT_ID=$(gcloud config get-value project)
gcloud iam service-accounts create vertex-ai-connect-omni
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:vertex-ai-connect-omni@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/aiplatform.user"
gcloud iam service-accounts keys create vertex-ai-connect-omni.json --iam-account=vertex-ai-connect-omni@$PROJECT_ID.iam.gserviceaccount.com --key-file-type=json
Step 2— Run AlloyDB Omni Docker on your local laptop.
For information on overall installation and set ups do check the offical docs here.
We will include the private key JSON from the previous step and map the local directory to the PostgreSQL data directory to retain data during Docker container restarts.
Arguments:
-v <<PATH ON LOCAL DIRECTORY TO MOUNT POSTGRES DATA>>
-v <<PRIVATE KEY PATH, GENERATED IN STEP 1>>
We will modify postgres configurations to enabled support for calling models from databases and integrate private key as part of API calls.
docker run -i --rm google/alloydbomni cat /usr/share/postgresql/postgresql.conf.sample > alloydb-omni-postgres.conf
echo "omni_enable_ml_agent_process = 'on'" >> alloydb-omni-postgres.conf
echo "google_ml_integration.enable_model_support= 'on'" >> alloydb-omni-postgres.conf
echo "omni_google_cloud_private_key_file_path = '/etc/postgresql/private-key.json'" >> alloydb-omni-postgres.conf
Docker Run Command
docker run --name alloydbomni \
-e POSTGRES_PASSWORD=<<PG_PASSWORD>> \
-v /Users/deepakmahto/Desktop/deepak/gcp/alloydbomnidata:/var/lib/postgresql/data \
-v /Users/deepakmahto/Desktop/deepak/gcp/IPL_Trivia/vertex-ai-connect-omni.json:/etc/postgresql/private-key.json \
-v "/Users/deepakmahto/Desktop/deepak/gcp/alloydb-omni-postgres.conf":/etc/postgresql/postgresql.conf \
-p 5434:5432 -d google/alloydbomni
Verify the connection to AlloyDB Omni using PostgreSQL’s psql
tool:
#non Windows Command, for window ignore PGPASSWORD
PGPASSWORD=<<PG_PASSWORD>> psql -h localhost -U postgres -d postgres -p 5434
Step 3— Create new database, extension and verify Vertex AI Integration.
We will create a new ipltrivia
database and create necessary extensions google_ml_integration
to enable API interaction with Google Cloud Foundational Models.
For testing all works as expected, we are testing it out with default loaded embedding model textembedding-gecko
.
create database ipltrivia;
\c ipltrivia
ipltrivia=# create extension google_ml_integration cascade;
ipltrivia=# create extension alloydb_ai_nl;
ipltrivia=# SELECT array_dims(embedding( 'textembedding-gecko@001', 'AlloyDB AI')::real[]);
array_dims
------------
[1:768]
(1 row)
Below extension should be available in the AlloyDB Omni database.
Step 4 — Load normalize IPL dataset and app code from repo
Clone the git repo that include database sql dump normalized and fine tune for NL usecase and necesary app code.
Load the SQL dump into the newly created database using psql
:
cd ipl-trivia-alloydb
#non Windows Command, for window ignore PGPASSWORD
PGPASSWORD=<<PG_PASSWORD>> psql -h localhost -U postgres -d ipltrivia -p 5434 -f ipltrivia.sql
Step 5— Register gemini-1.5-pro model in AlloyDB Omni.
We’ll use the alloydb_ai_nl
functions to extract metadata from the database and pass it as context with internally auto generated prompts to foundational models gemini-1.5-pro
:
PGPASSWORD=******** psql -h localhost -U postgres -d ipltrivia -p 5434
--Replace <<project_name>> with your project name
ipltrivia=# CALL google_ml.create_model
(model_id => 'gemini-pro-text-to-sql',
model_request_url => 'https://us-central1-aiplatform.googleapis.com/v1/projects/<<project_name>>/locations/us-central1/publishers/google/models/gemini-1.5-pro-001:streamGenerateContent',
model_provider => 'google',
model_auth_type => 'alloydb_service_agent_iam'
);
ipltrivia=# select * from google_ml.model_info_view where model_id = 'gemini-pro-text-to-sql';
-[ RECORD 1 ]--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------
model_id | gemini-pro-text-to-sql
model_request_url | https://us-central1-aiplatform.googleapis.com/v1/projects/*******/locations/us-central1/publishers/google/models/gemini-1.5-pro-001:streamGenerateContent
model_provider | google
model_type | generic
model_qualified_name |
model_auth_type | alloydb_service_agent_iam
model_auth_id |
input_transform_fn |
output_transform_fn |
Step 6— Build up the App Layer using streamlit
We’ll use Streamlit to create the app layer that calls the database function public.alloydbnl2sql that wraps the call to alloydb_ai_nl.google_get_sql_current_schema available as part of alloydb_ai_nl extensions.
CREATE FUNCTION public.alloydbnl2sql(text) RETURNS text
LANGUAGE plpgsql
..................................
SELECT google_get_sql_current_schema into sqltext
from alloydb_ai_nl.google_get_sql_current_schema(sql_text => $1,
model_id => 'gemini-pro-text-to-sql' );
return sqltext;
exception
...............
end;
$_$;
cd IPL_Trivia
python3 -m venv ipl
source ipl/bin/activate
pip3 install -r requirement.txt
After setting up, modify the .env
files with your database credentials and launch IPL Trivia:
python3 -m streamlit run app.py
IPL Trivia Time with AlloyDB OMNI.
Some of the Trivial, i was able to extract using the foundation model.
Batsman with most run scored in noballs as extra type
Top 5 bowler with most no.of wickets in ipl
Team that played most no.of final
Batsman with most threes in run
Bowler with most run conceded in a single match.
Conclusion: And That’s a Wrap!
If you’ve made it this far, you’re now equipped with a trivia machine that not only tests your IPL knowledge but shows off the power of LLM-driven data queries. Whether you’re an IPL fanatic or just someone who loves cool tech, this project combines the best of both worlds. So, fire it up, challenge your friends, and discover those hidden IPL gems! Who knows — maybe you’ll uncover a trivia fact no one else knew! Ready to dive into the world of cricket data and outscore the competition? Let the games begin! 🎉
References :-
The IPL dataset was built using Kaggle data, which was later normalized, with additional comments added to the columns and tables.