Better Together: OpenAI Embeddings API With PostgreSQL pgvector Extension

8 min readSep 28, 2023

OpenAI’s platform provides several models and APIs for natural language processing. Among these, the OpenAI Embeddings API stands out. It transforms various text inputs — be it user prompts, descriptions, or titles — into vectorized embeddings. These embeddings, once generated, pave the way for rapid similarity searches within a PostgreSQL database.

To optimize similarity searches in PostgreSQL, the pgvector extension is key. It efficiently stores vectors (embeddings), supports essential distance and similarity operations, and introduces special indexes for high performance, even with extensive multi-dimensional embeddings.

This article shows how to use the OpenAI Embeddings API and Postgres pgvector extension programmatically. We’ll create an app to recommend San Francisco lodgings, starting with the basic OpenAI Chat API. Then, we’ll enhance responsiveness using the OpenAI Embeddings API with pgvector, and later, optimize with the HNSW index and scale with YugabyteDB.

Lodging Recommendation App

The sample application assumes that a user is planning to visit San Francisco and assists with places to stay based on user preferences.

The app receives a user prompt via the React frontend and forwards the user’s input to the Node.js backend. The server-side logic then uses one of the modes below to come up with the most suitable lodging recommendations:

OpenAI Chat Mode: In this mode, the Node.js backend leverages the OpenAI Chat Completion API and the GPT-4 model to generate lodging recommendations based on the user’s input.
Postgres Embeddings Mode: Initially, the backend employs the OpenAI Embeddings API to generate an embedding from the user’s input. Subsequently, the server utilizes the PostgreSQL pgvector extension to perform a vector search among the sample Airbnb properties stored in the database. We can use either PostgreSQL or YugabyteDB (distributed PostgreSQL-compatible database), both support pgvector.

You can start the app by following its GitHub instructions. Once started, play with the app by asking for recommendations in San Francisco.

Make sure to switch between the OpenAI Chat and Postgres Embeddings mode to compare results and responsiveness.

Now, let’s dive into details to understand how the app leverages the OpenAI API and pgvector extension.

OpenAI Chat Mode

The application’s OpenAI Chat mode fully relies on the Chat Completion APIs.

This is an OpenAI-all-in strategy. The app doesn’t interact with Postgres at all. Instead, it sends a request to the GPT-4 model, similar to how we interact with the ChatGPT service by exchanging messages with the AI:

const chatCompletion = await this.#openai.chat.completions.create({
    messages: [
        {
            role: "system", content:
                "You're a helpful assistant that helps to find lodging in San Francisco. Suggest three options. Send back a JSON object in the format below." +
                "[{\"name\": \"<hotel name>\", \"description\": \"<hotel description>\", \"price\": <hotel price>}]" +
                "Don't add any other text to the response. Don't add the new line or any other symbols to the response, just the raw JSON."
        },
        {
            role: "user", content: prompt
        }
    ],
    model: "gpt-4",
});

The API call includes a system message that provides the GPT-4 model with context on its expected behavior. For example, the AI is instructed to assist with lodging in San Francisco and to return recommendations in a specific JSON format that the app’s React frontend recognizes.
What follows next is the user message that carries the user prompt and the model type, which is set to gpt-4.

Once the AI provides several lodging options, the app forwards the result to the React frontend:

const places = JSON.parse(chatCompletion.choices[0].message.content);

return places;

The Chat Completion API is effective, but it has a drawback. Typically, it takes over 10 seconds to receive recommendations from the AI. To expedite this process, we can transition to the OpenAI Embeddings API combined with the Postgres pgvector extension.

Postgres Embeddings Mode

To get the OpenAI Embeddings API working hand in hand with pgvector, the app needs to follow several steps.

Step 1 📥: Load an initial data set into Postgres and enable the pgvector extension.

Indeed, we need our own data set to speed things up. The OpenAI Embeddings API can generate a vectorized representation of a text string using its massive and knowledgeable model, but it has to be our text string.

The sample lodging app uses the Aribnb data set for San Francisco— a table with over 7,500 listings.

CREATE TABLE airbnb_listing(
    id bigint NOT NULL,
    listing_url text NOT NULL,
    scrape_id DECIMAL(14, 0) NOT NULL,
    last_scraped timestamp without time zone NOT NULL,
    name text,
    summary text,
    space text,
    description text,
    experiences_offered varchar(4) NOT NULL,
    neighborhood_overview text,
    notes text,
    transit text,
    access text,
    interaction text,
    house_rules text,
    thumbnail_url varchar,
    medium_url varchar,
    picture_url text NOT NULL,
    xl_picture_url text,
    host_id integer NOT NULL,
    host_url text NOT NULL,
    
    ... remaining columns

Step 2 🧠: Utilize the OpenAI Embeddings API to generate embeddings for the necessary columns in your data set and store them in the database.

Which columns should we generate embeddings for? Choose those whose content will facilitate natural language processing for the application’s use case.

In the case of our sample app, the content of the description column is extensive and descriptive, making it an excellent choice for similarity search. Here’s a sample description of one of the Airbnb properties:

Nice and good public transportation.  
7 minutes walk to UCSF.  15 minutes walk to USF, ST Mary 
Hospital  Room rental-sunny view room/sink/Wi Fi (inner sunset / UCSF) 
(map) The place is located within walking distance to Golden Gate Park, 
UCSF &  USF Room rental Sunny view Rm/Wi-Fi/TV/sink/large eat in Kitchen 
- location  (UCSF/COLE Valley/ Haight Ashbury/Downtown) Settle down, 
S.F. resident, student, hospital, job relocated, rotation.  
Monthly rental, short term staying - Nice public transportation location.  
No tourist or transient.  No pet no smoking no party inside the building. 
Medium private lock room(130 square feet) single person.
Large private lock room(190 square feet) single person. 
Extra large private room(230 square feet)  single person. 
Additional bed $12 per night. Surround by sunny view and good 
transportation.  Room with sink(apartment style - shared large eat 
in kitchen/4  bathrooms). Quiet house under management.  
Washer and Dryer (coins operated machine).

To generate the embeddings, we can use the following logic (full source code):

let id = 0;
let length = 0;
let totalCnt = 0;

const res = await client.query(
    "SELECT id, description FROM airbnb_listing " +
    "WHERE id >= $1 and description IS NOT NULL ORDER BY id LIMIT 200", [id]);
    
    length = res.rows.length;
    let rows = res.rows;

    if (length > 0) {
        for (let i = 0; i < length; i++) {
            const description = rows[i].description.replace(/\*|\n/g, ' ');

            id = rows[i].id;

            const embeddingResp = await openai.embeddings.create({
                model: "text-embedding-ada-002",
                input: description,
            });

            if (!checkEmbeddingValid(embeddingResp))
                return;

            const res = await client.query(
"UPDATE airbnb_listing SET description_embedding = $1 WHERE id = $2",
                ['[' + embeddingResp.data[0].embedding + ']', id]);

            totalCnt++;
        }

        id++;

        console.log(`Processed ${totalCnt} rows`);
    }
} while (length != 0);

The logic iterates through all the Airbnb listings — a batch of 200 listings at a time.
It generates a vectorized representation for each listing’s description column using the openai.embeddings API and the text-embedding-ada-002 model.
The generated embeddings are then stored in thedescription_embedding column of the Airbnb table.

Step 3 ⚡: When a user submits a prompt, employ the OpenAI Embeddings API to generate its vectorized representation.

With the database populated with embeddings, we’re set to handle user prompts. Each prompt must be converted into an embedding using the OpenAI Embedding API, employing a model similar to the one we utilized in step 2.

const embeddingResp = await this.#openai.embeddings.create(
      {
          model: "text-embedding-ada-002",
          input: prompt
      }
  );

Simply pass the userprompt to the text-embedding-ada-002 model and initiate the API call.

Step 4 🔍: Use the newly generated user prompt’s embedding to perform a similarity search across the embeddings stored in Postgres.

Once OpenAI produces an embedding for the user prompt (typically in less than a second), search for Airbnb listings in Postgres that align with the user’s preferences:

let userPromptEmbedding = embeddingResp.data[0].embedding;
let matchThreshold = 0.7;
let matchCnt = 3;

const res = await this.#client.query(
    "SELECT name, description, price, 1 - (description_embedding <=> $1) as similarity " +
    "FROM airbnb_listing WHERE 1 - (description_embedding <=> $1) > $2 " + 
    "ORDER BY description_embedding <=> $1 LIMIT $3",
    ['[' + embeddingResp.data[0].embedding + ']', matchThreshold, matchCnt]);

let places = [];

for (let i = 0; i < res.rows.length; i++) {
    const row = res.rows[i];

    places.push(
        { "name": row.name, "description": row.description, 
          "price": row.price, "similarity": row.similarity });
}

return places;

Postgres will recommend up to 3 places (determined by the matchCnt parameter) with a similarity to the user prompt's embedding of 0.7 or above (defined by the matchThreshold parameter).
The database calculates similarity using the formula 1 - (description_embedding <=> $1), where description_embedding <=> $1 represents the cosine distance between an Airbnb listing's description and the user prompt. A higher calculated similarity indicates a more relevant Airbnb listing for the user.
The recommended places are converted into JSON format and sent back to the React frontend in the places array.

Supporting the Postgres Embeddings mode for our sample application required more steps compared to the initial OpenAI Chat mode. However, the responsiveness of the application improves significantly — from over 10 seconds in OpenAI Chat mode to around 1 second in the Postgres Embeddings mode.

More Ways To Scale

Currently, the sample app stores description embeddings for 7,500 Airbnb listings. Each embedding is a 1536-dimensional vector (OpenAI generates embeddings with 1536 dimensions). It takes milliseconds for Postgres to perform a full table scan, comparing the user prompt’s embedding against the Airbnb listing embeddings.

However, the full table scan has its limits. As the data set grows, it will take more time for Postgres to perform the similarity search.

Indexes are the primary solution to maintain scalability and performance. The pgvector extension supports several index types, including the HNSW index (Hierarchical Navigable Small World). This index allows for low-latency similarity searches on large data sets, albeit with a trade-off in accuracy since it neither indexes nor traverses all the data.

For example, here’s how we can create the HNSW index for our Airbnb embeddings:

CREATE INDEX ON airbnb_listing
USING hnsw (description_embedding vector_cosine_ops)
WITH (m = 4, ef_construction = 10);

For a deeper understanding of the HNSW index and its various parameters, refer to this comprehensive article: HNSW Indexes with Postgres and pgvector.

Another viable option is YugabyteDB, a distributed database built on PostgreSQL. It enables scaling beyond the capacity of a single database server by distributing the data and load across multiple database nodes.

YugabyteDB announced early support for pgvector in September 2023. To integrate it:

Launch a database instance with the necessary configuration using version 2.19.2.0 or later.
Update the database connectivity settings in the application.properties.ini file of the sample application.

# Configuration for a locally running YugabyteDB instance with defaults.
DATABASE_HOST=localhost
DATABASE_PORT=5433
DATABASE_NAME=yugabyte
DATABASE_USER=yugabyte
DATABASE_PASSWORD=yugabyte

No other code-level changes are necessary because YugabyteDB is source-code and runtime compatible with Postgres.

Summary

In this walkthrough, we dug into the mechanics of pairing the OpenAI Embeddings API with PostgreSQL pgvector extension. By converting text to vectorized embeddings with OpenAI and leveraging pgvector for efficient storage and search, we found a solid setup for building snappy AI applications. If you’re coding up projects with hefty natural language processing demands, this combo can be a real game-changer, especially when wrestling with large datasets.