Power Up Your Postgres: Unlocking Vector Embeddings with pgVector on AlloyDB and Cloud SQL

Eden Rorabaugh
Google Cloud for Startups
4 min readMay 29, 2024

--

Vector databases are revolutionizing how AI-powered applications handle similarity-based queries and recommendations. By focusing on similarity, meaning, and context, vector databases handle information more like humans do, making them ideal for AI workloads. One way to start easily leveraging vector databases in your infrastructure is with pgvector. Pgvector is an extension that transforms PostgreSQL into a vector database, and GCP offers two ways to integrate it: AlloyDB for PostgreSQL and Cloud SQL for PostgreSQL. In this blog, we’ll get into the differences between these two options, their ideal use cases, and how to get started. Whether you’re an early or later stage startup, AI is top of mind. pgVector is a great avenue to ramp up Postgres instances and start leveraging the power of vector databases.

Firstly, what is AlloyDB for PostgreSQL?

AlloyDB for PostgreSQL is a fully-managed, cloud-native database service optimized for transactional workloads and superior PostgreSQL compatibility. It’s ideal for vector use cases because of its performance, AI-ML focus, and seamless setup. It’s built for computationally intensive vector search operations, and often offers a quick start with pgvector pre-installed.

Now, what is CloudSQL for PostgreSQL?

Cloud SQL for PostgreSQL is a fully-managed service providing a familiar PostgreSQL environment on GCP. When it comes to pgvector, some of the major benefits of using Cloud SQL include its ability to be customized, integration options, and granular control. You choose the pgvector version, giving you more authority. CloudSQL also integrates easily with other GCP services, and provides granular control over configurations needed for vector workloads. It provides a flexible and familiar way to create and manage PostgreSQL instances in GCP.

When do you use AlloyDB vs CloudSQL?

Keeping the strengths of each managed service in mind, it’s important to note that you don’t need to choose one over the other! It’s completely possible to have a hybrid set up; however, CloudSQL will be a better fit for workloads that are smaller or workloads that are pre existing PostgreSQL workloads that you want to migrate. AlloyDB is a powerful tool, more effective for larger and more intensive workloads. Running a cost/benefit ratio can be helpful to determine which solution is the most affordable. CloudSQL is typically more cost effective compared to AlloyDB, but it is less equipped than AlloyDB to handle performance sensitive scenarios.

Now that you know the differences, how do you get started?

Provisioning:

Enable pgvector:

  • AlloyDB: It might be pre-enabled. If not, use CREATE EXTENSION pgvector;
  • Cloud SQL: You’ll need to install it with CREATE EXTENSION pgvector;

Data and Embeddings:

  • Load your data into your PostgreSQL database.
  • Use an embedding model (TensorFlow, Vertex AI, etc.) to generate embeddings.
  • Create a column with the ‘vector’ data type in your table.

Example: CREATE TABLE products (p_id bigserial PRIMARY KEY, embeddings vector(5));

  • Store the embeddings.

Similarity Search:

Leverage pgvector’s functions and operators for vector search queries. Operations include:

  • Element-wise addition (+)
  • Element-wise subtraction (-)
  • L2/Euclidean distance (<->)
  • Negative inner product (<#>)
  • Cosine distance (<=>)

By default, pgvector performs exact nearest neighbor search, which provides perfect recall. Increase recall success by:

  • Creating the index after the table has some data
  • Choosing an appropriate number of lists- a good place to start is rows/1000 for up to 1M rows and swrt(rows) for over 1M rows.
  • When querying, specify an appropriate number of probes (higher is better for recall, lower is better for speed)

Key Takeaways

Both AlloyDB for PostgreSQL and Cloud SQL for PostgreSQL let you integrate vector search with the pgvector extension in GCP, but choosing the right option for your startup can give your company a competitive edge. The right choice depends on your project scale, performance needs, and customization preferences, but these are not your only two options! Explore Vertex AI Search for a vector database option that offers auto-scaling and high recall to explore other avenues of implementation.

If you are a startup owner who is actively seeking to propel your business to new heights with GCP, we invite you to join our exclusive virtual live workshops, where you’ll gain hands-on guidance from Google Cloud experts and discover how to seamlessly integrate GCP into your operations. Register now and secure your spot!

Disclaimer: This article is not recommended practice for building and deploying production ready applications, it’s simply a way to learn more about the Google Cloud Platform. Any opinions in this article are my own and not those of Google!

--

--