Setting Up Vector Embeddings and Oracle Generative AI with Oracle Database 23ai
In this tutorial, we’ll guide you through setting up Oracle Database 23ai for vector embeddings using Podman and Python. We’ll also demonstrate how to leverage a large language model (LLM) using Oracle’s Generative AI services to build a question-answering chatbot with vector-based retrieval.
Step 1: Installing and Running Oracle Database 23ai with Podman
We begin by installing Podman and running the Oracle Database 23ai container.
Enabling the Oracle Linux 8 Add-ons Repository
To install Podman, we need to enable the Oracle Linux 8 add-ons repository:
sudo dnf config-manager --set-enabled ol8_addons
sudo dnf install -y podman
Running Oracle Database 23ai in a Container
Once Podman is installed, we can run the Oracle Database 23ai container:
podman run \
-d \
--name 23ai \
-p 1521:1521 \
-e ORACLE_PWD=<your-db-password> \
-v oracle-volume:/home/<your-home-directory>/oradata \
container-registry.oracle.com/database/free:latest
Verify that the container is running:
podman ps
podman exec -it 23ai /bin/bash
Step 2: Database Setup for Vector Storage
Let’s configure Oracle Database 23ai for storing vector embeddings.
Create the necessary tablespaces:
CREATE BIGFILE TABLESPACE tbs2 DATAFILE 'bigtbs_f2.dbf' SIZE 1G AUTOEXTEND ON NEXT 32M MAXSIZE UNLIMITED EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO;
CREATE UNDO TABLESPACE undots2 DATAFILE 'undotbs_2a.dbf' SIZE 1G AUTOEXTEND ON RETENTION GUARANTEE;
CREATE TEMPORARY TABLESPACE temp_demo TEMPFILE 'temp02.dbf' SIZE 1G REUSE AUTOEXTEND ON NEXT 32M MAXSIZE UNLIMITED EXTENT MANAGEMENT LOCAL UNIFORM SIZE 1M;
Create the user for handling vector embeddings
CREATE USER vector IDENTIFIED BY vector DEFAULT TABLESPACE tbs2 QUOTA UNLIMITED ON tbs2;
GRANT DB_DEVELOPER_ROLE TO vector;
Optimize database memory for vector embeddings:
sqlplus / as sysdba
CREATE PFILE FROM SPFILE;
ALTER SYSTEM SET vector_memory_size = 512M SCOPE=SPFILE;
SHUTDOWN;
STARTUP;
Step 3: Setting Up Python Environment
Now, let’s set up the Python environment for generating and working with vector embeddings.
Install necessary package
sudo dnf install git sqlite-devel libffi-devel
curl https://pyenv.run | bash
Configure pyenv
for Python 3.12:
nano ~/.bashrc
# Add the following lines
export PYENV_ROOT="$HOME/.pyenv"
[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init --path)"
eval "$(pyenv virtualenv-init -)"
source ~/.bashrc
pyenv install 3.12
pyenv local 3.12
Install required Python libraries:
pip install oracledb sentence-transformers oci jupyterlab
Run Jupyter Lab for your project:
jupyter-lab --no-browser --ip 0.0.0.0 --port 8888
Open an SSH tunnel to allow access to the Jupyter server
ssh -L 8888:localhost:8888 -i <your_private_key> opc@<public_ip_address>
Step 4: Handling Vector Embeddings in Jupyter
Task 1: Load FAQ Data into the Database
Load FAQ data into the notebook:
Download the sample file from this link.
Now, drag and drop the downloaded file into the Jupyter window to upload it to your remote instance.
import os
def loadFAQs(directory_path):
faqs = {}
for filename in os.listdir(directory_path):
if filename.endswith(".txt"):
file_path = os.path.join(directory_path, filename)
with open(file_path) as f:
raw_faq = f.read()
faqs[filename] = [text.strip() for text in raw_faq.split('=====')]
return faqs
faqs = loadFAQs('.')
Prepare the FAQ data for vector storage
docs = [{'text': filename + ' | ' + section, 'path': filename} for filename, sections in faqs.items() for section in sections]
Connect to Oracle Database and create the necessary table:
import oracledb
un = "vector"
pw = "vector"
cs = "localhost/FREEPDB1"
connection = oracledb.connect(user=un, password=pw, dsn=cs)
table_name = 'faqs'
with connection.cursor() as cursor:
cursor.execute(f"""
CREATE TABLE IF NOT EXISTS {table_name} (
id NUMBER PRIMARY KEY,
payload CLOB CHECK (payload IS JSON),
vector VECTOR
)""")
Vectorize the FAQ content:
from sentence_transformers import SentenceTransformer
encoder = SentenceTransformer('all-MiniLM-L12-v2')
data = [{"id": idx, "vector_source": row['text'], "payload": row} for idx, row in enumerate(docs)]
texts = [row['vector_source'] for row in data]
embeddings = encoder.encode(texts, batch_size=32)
import array
for row, embedding in zip(data, embeddings):
row['vector'] = array.array("f", embedding)
Insert the vectorized data into the database:
import json
with connection.cursor() as cursor:
cursor.execute(f"TRUNCATE TABLE {table_name}")
prepared_data = [(row['id'], json.dumps(row['payload']), row['vector']) for row in data]
cursor.executemany(f"INSERT INTO {table_name} (id, payload, vector) VALUES (:1, :2, :3)", prepared_data)
connection.commit()
Step 5: Retrieving Vectors and Using Generative AI
Task 1: Vectorize the Question
We will transform the user’s question into a vector and retrieve relevant FAQ chunks from the database.
SQL query for retrieving top results:
topK = 3
sql = f"""SELECT payload, vector_distance(vector, :vector, COSINE) AS score
FROM {table_name}
ORDER BY score
FETCH FIRST {topK} ROWS ONLY"""
Encode the user’s question and retrieve results
question = "What is Always Free?"
embedding = list(encoder.encode(question))
vector = array.array("f", embedding)
results = []
with connection.cursor() as cursor:
for (info, score,) in cursor.execute(sql, vector=vector):
text_content = info.read()
results.append((score, json.loads(text_content)))
Task 2: Creating the LLM Prompt
Prepare the prompt for the LLM:
from transformers import LlamaTokenizerFast
tokenizer = LlamaTokenizerFast.from_pretrained("hf-internal-testing/llama-tokenizer")
def truncate_string(string, max_tokens):
tokens = tokenizer.encode(string, add_special_tokens=True)
truncated_tokens = tokens[:max_tokens]
return tokenizer.decode(truncated_tokens)
docs_as_one_string = "\n=========\n".join([doc["text"] for doc in docs])
docs_truncated = truncate_string(docs_as_one_string, 1000)
prompt = f"""
<s>[INST] <<SYS>> You are a helpful assistant named Oracle chatbot.
USE ONLY the sources below and ABSOLUTELY IGNORE any previous knowledge.
<</SYS>> [/INST]
[INST]
Respond PRECISELY to this question: "{question}" USING ONLY the following information.
Sources: {docs_truncated}
Answer (max 50 words per paragraph):
[/INST]
"""
Task 3: Call Oracle’s Generative AI Service
Set up the OCI config:
[DEFAULT]
user=<your user's ocid>
key_file=<path to your .pem key>
fingerprint=<fingerprint of the key>
tenancy=<your tenancy's ocid>
region=us-chicago-1
import oci
config = oci.config.from_file('config', "DEFAULT")
endpoint = "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com"
compartment_id = "<your-compartment-id>"
generative_ai_inference_client = oci.generative_ai_inference.GenerativeAiInferenceClient(config=config, service_endpoint=endpoint)
Generate the response using the LLM:
chat_request = oci.generative_ai_inference.models.CohereChatRequest()
chat_request.message = prompt
chat_request.max_tokens = 1000
chat_detail = oci.generative_ai_inference.models.ChatDetails()
chat_detail.serving_mode = oci.generative_ai_inference.models.OnDemandServingMode(model_id="cohere.command-r-plus")
chat_detail.chat_request = chat_request
chat_detail.compartment_id = compartment_id
chat_response = generative_ai_inference_client.chat(chat_detail)
# Pretty-print the response with pprint.pp for readability
import pprint
pprint.pp(chat_response.data.chat_response.chat_history[1].message)
# Alternatively, just print the message directly
print(chat_response.data.chat_response.chat_history[1].message)
Final Step: Understanding the Output
You can use pprint.pp
for a well-formatted and indented output, especially useful if the LLM response is more complex. If you're expecting a simple text response, you can use print
to display the result directly.
At this point, the Oracle Generative AI service provides a response based on the retrieved FAQ chunks, efficiently answering the user’s query using vector embeddings.
Disclaimer: The views expressed in this document are my own and do not necessarily reflect those of my employer.
Reference: LiveLabs Workshop — Vector Search with Oracle Database 23c