Setting Up Vector Embeddings and Oracle Generative AI with Oracle Database 23ai

5 min readOct 12, 2024

In this tutorial, we’ll guide you through setting up Oracle Database 23ai for vector embeddings using Podman and Python. We’ll also demonstrate how to leverage a large language model (LLM) using Oracle’s Generative AI services to build a question-answering chatbot with vector-based retrieval.

Step 1: Installing and Running Oracle Database 23ai with Podman

We begin by installing Podman and running the Oracle Database 23ai container.

Enabling the Oracle Linux 8 Add-ons Repository

To install Podman, we need to enable the Oracle Linux 8 add-ons repository:

sudo dnf config-manager --set-enabled ol8_addons
sudo dnf install -y podman

Running Oracle Database 23ai in a Container

Once Podman is installed, we can run the Oracle Database 23ai container:

podman run \
    -d \
    --name 23ai \
    -p 1521:1521 \
    -e ORACLE_PWD=<your-db-password> \
    -v oracle-volume:/home/<your-home-directory>/oradata \
    container-registry.oracle.com/database/free:latest

Verify that the container is running:

podman ps
podman exec -it 23ai /bin/bash

Step 2: Database Setup for Vector Storage

Let’s configure Oracle Database 23ai for storing vector embeddings.

Create the necessary tablespaces:

CREATE BIGFILE TABLESPACE tbs2 DATAFILE 'bigtbs_f2.dbf' SIZE 1G AUTOEXTEND ON NEXT 32M MAXSIZE UNLIMITED EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO;
CREATE UNDO TABLESPACE undots2 DATAFILE 'undotbs_2a.dbf' SIZE 1G AUTOEXTEND ON RETENTION GUARANTEE;
CREATE TEMPORARY TABLESPACE temp_demo TEMPFILE 'temp02.dbf' SIZE 1G REUSE AUTOEXTEND ON NEXT 32M MAXSIZE UNLIMITED EXTENT MANAGEMENT LOCAL UNIFORM SIZE 1M;

Create the user for handling vector embeddings

CREATE USER vector IDENTIFIED BY vector DEFAULT TABLESPACE tbs2 QUOTA UNLIMITED ON tbs2;
GRANT DB_DEVELOPER_ROLE TO vector;

Optimize database memory for vector embeddings:

sqlplus / as sysdba
CREATE PFILE FROM SPFILE;
ALTER SYSTEM SET vector_memory_size = 512M SCOPE=SPFILE;
SHUTDOWN;
STARTUP;

Step 3: Setting Up Python Environment

Now, let’s set up the Python environment for generating and working with vector embeddings.

Install necessary package

sudo dnf install git sqlite-devel libffi-devel
curl https://pyenv.run | bash

Configure pyenv for Python 3.12:

nano ~/.bashrc

# Add the following lines
export PYENV_ROOT="$HOME/.pyenv"
[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init --path)"
eval "$(pyenv virtualenv-init -)"

source ~/.bashrc
pyenv install 3.12
pyenv local 3.12

Install required Python libraries:

pip install oracledb sentence-transformers oci jupyterlab

Run Jupyter Lab for your project:

jupyter-lab --no-browser --ip 0.0.0.0 --port 8888

Open an SSH tunnel to allow access to the Jupyter server

ssh -L 8888:localhost:8888 -i <your_private_key> opc@<public_ip_address>

Step 4: Handling Vector Embeddings in Jupyter

Task 1: Load FAQ Data into the Database

Load FAQ data into the notebook:

Download the sample file from this link.

Now, drag and drop the downloaded file into the Jupyter window to upload it to your remote instance.

import os

def loadFAQs(directory_path):
    faqs = {}
    for filename in os.listdir(directory_path):
        if filename.endswith(".txt"):
            file_path = os.path.join(directory_path, filename)
            with open(file_path) as f:
                raw_faq = f.read()
            faqs[filename] = [text.strip() for text in raw_faq.split('=====')]
    return faqs

faqs = loadFAQs('.')

Prepare the FAQ data for vector storage

docs = [{'text': filename + ' | ' + section, 'path': filename} for filename, sections in faqs.items() for section in sections]

Connect to Oracle Database and create the necessary table:

import oracledb

un = "vector"
pw = "vector"
cs = "localhost/FREEPDB1"
connection = oracledb.connect(user=un, password=pw, dsn=cs)

table_name = 'faqs'
with connection.cursor() as cursor:
    cursor.execute(f"""
        CREATE TABLE IF NOT EXISTS {table_name} (
            id NUMBER PRIMARY KEY,
            payload CLOB CHECK (payload IS JSON),
            vector VECTOR
        )""")

Vectorize the FAQ content:

from sentence_transformers import SentenceTransformer
encoder = SentenceTransformer('all-MiniLM-L12-v2')

data = [{"id": idx, "vector_source": row['text'], "payload": row} for idx, row in enumerate(docs)]
texts = [row['vector_source'] for row in data]
embeddings = encoder.encode(texts, batch_size=32)

import array
for row, embedding in zip(data, embeddings):
    row['vector'] = array.array("f", embedding)

Insert the vectorized data into the database:

import json

with connection.cursor() as cursor:
    cursor.execute(f"TRUNCATE TABLE {table_name}")
    prepared_data = [(row['id'], json.dumps(row['payload']), row['vector']) for row in data]
    cursor.executemany(f"INSERT INTO {table_name} (id, payload, vector) VALUES (:1, :2, :3)", prepared_data)
    connection.commit()

Step 5: Retrieving Vectors and Using Generative AI

Task 1: Vectorize the Question

We will transform the user’s question into a vector and retrieve relevant FAQ chunks from the database.

SQL query for retrieving top results:

topK = 3
sql = f"""SELECT payload, vector_distance(vector, :vector, COSINE) AS score
          FROM {table_name}
          ORDER BY score
          FETCH FIRST {topK} ROWS ONLY"""

Encode the user’s question and retrieve results

question = "What is Always Free?"
embedding = list(encoder.encode(question))
vector = array.array("f", embedding)

results = []
with connection.cursor() as cursor:
    for (info, score,) in cursor.execute(sql, vector=vector):
        text_content = info.read()
        results.append((score, json.loads(text_content)))

Task 2: Creating the LLM Prompt

Prepare the prompt for the LLM:

from transformers import LlamaTokenizerFast

tokenizer = LlamaTokenizerFast.from_pretrained("hf-internal-testing/llama-tokenizer")

def truncate_string(string, max_tokens):
    tokens = tokenizer.encode(string, add_special_tokens=True)
    truncated_tokens = tokens[:max_tokens]
    return tokenizer.decode(truncated_tokens)

docs_as_one_string = "\n=========\n".join([doc["text"] for doc in docs])
docs_truncated = truncate_string(docs_as_one_string, 1000)

prompt = f"""
<s>[INST] <<SYS>> You are a helpful assistant named Oracle chatbot. 
USE ONLY the sources below and ABSOLUTELY IGNORE any previous knowledge.
<</SYS>> [/INST]
[INST]
Respond PRECISELY to this question: "{question}" USING ONLY the following information.
Sources: {docs_truncated}
Answer (max 50 words per paragraph):
[/INST]
"""

Task 3: Call Oracle’s Generative AI Service

Set up the OCI config:

[DEFAULT]
user=<your user's ocid>
key_file=<path to your .pem key>
fingerprint=<fingerprint of the key>
tenancy=<your tenancy's ocid>
region=us-chicago-1

import oci

config = oci.config.from_file('config', "DEFAULT")
endpoint = "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com"
compartment_id = "<your-compartment-id>"

generative_ai_inference_client = oci.generative_ai_inference.GenerativeAiInferenceClient(config=config, service_endpoint=endpoint)

Generate the response using the LLM:

chat_request = oci.generative_ai_inference.models.CohereChatRequest()
chat_request.message = prompt
chat_request.max_tokens = 1000
chat_detail = oci.generative_ai_inference.models.ChatDetails()
chat_detail.serving_mode = oci.generative_ai_inference.models.OnDemandServingMode(model_id="cohere.command-r-plus")
chat_detail.chat_request = chat_request
chat_detail.compartment_id = compartment_id

chat_response = generative_ai_inference_client.chat(chat_detail)

# Pretty-print the response with pprint.pp for readability
import pprint
pprint.pp(chat_response.data.chat_response.chat_history[1].message)

# Alternatively, just print the message directly
print(chat_response.data.chat_response.chat_history[1].message)

Final Step: Understanding the Output

You can use pprint.pp for a well-formatted and indented output, especially useful if the LLM response is more complex. If you're expecting a simple text response, you can use print to display the result directly.

At this point, the Oracle Generative AI service provides a response based on the retrieved FAQ chunks, efficiently answering the user’s query using vector embeddings.

Disclaimer: The views expressed in this document are my own and do not necessarily reflect those of my employer.

Reference: LiveLabs Workshop — Vector Search with Oracle Database 23c