Vector Databases: a VC buzzword or the future of AI?

P. Kopyrski
B8125-Spring2023
Published in
5 min readMay 5, 2023

During a recent Digital Literacy class, we delved into relational, semi-structured, and unstructured databases, exploring the diverse data architectures. As our world becomes increasingly driven by data, particularly in the digital realm, understanding the nuances of data storage and accessibility is paramount for building robust cyber infrastructures. Organizations across various industries strive to harness the power of data for informed decision-making, advanced analytics, and AI applications, and the need for innovative database solutions has become increasingly apparent. Among these emerging technologies, vector databases have captured the attention of venture capitalists and technologists alike, leading us to question whether they are merely a fleeting buzzword or a key component in the future of AI.

What are vector databases?

Vector databases are specialized databases designed to efficiently store, manage, and query high-dimensional vector data.[1] In this context, vector data refers to multi-dimensional numerical data represented as vectors or ordered lists of numerical values. These databases are particularly well-suited for processes that involve searching for similar vectors, performing mathematical operations on vectors, or retrieving vectors based on specific criteria.[2]

Vector databases have a wide range of applications, particularly in areas with high-dimensional data, such as image and video recognition, natural language processing, recommendation systems, or anomaly detection.[3] Traditional databases, such as relational or NoSQL databases, are not optimized for handling high-dimensional vector data, which is common in AI and machine learning applications. Vector databases, on the other hand, incorporate several features and optimizations to address the unique challenges associated with high-dimensional vector data, including:

· Efficient storage: Vector databases use specialized data structures and compression techniques to store large volumes of vector data in a compact and efficient manner.

· Indexing: Advanced indexing techniques, such as approximate nearest neighbor search, enable fast and accurate search and retrieval of similar vectors.

· Query optimization: Vector databases are designed to handle complex, multi-dimensional queries, often supporting operations like nearest neighbor search, which finds the most similar vectors to a given query vector based on similarity measures like cosine similarity or Euclidean distance.

· Scalability: Many vector databases are built to scale horizontally, allowing them to handle growing datasets and distribute the workload across multiple nodes or clusters.

Some more mature vector databases and libraries include FAISS (Facebook AI Similarity Search)[4] and Milvus.[5]

Why are VCs interested in vector databases?

Venture capital firms are always on the lookout for promising technologies that have the potential to disrupt industries or create new markets. Vector databases have garnered interest from VC firms as an example of the “picks & shovels” strategy for the AI industry. This strategy is based on the analogy of the Gold Rush, where those who provided the miners with essential tools (picks and shovels) profited greatly, even if they didn’t mine for gold themselves. In the context of the AI industry, the “picks & shovels” strategy refers to investing in the fundamental infrastructure, tools, and services that enable AI development and deployment rather than investing directly in AI applications or companies.

There are several reasons why VC firms find vector databases an attractive investment opportunity as part of this strategy. As AI and machine learning continue to permeate various sectors, the need for efficient ways to manage high-dimensional vector data, which is critical for these applications, increases. By enabling faster and more accurate search and retrieval of high-dimensional data, vector databases can accelerate AI development and improve the overall performance of the related infrastructure. This, in turn, can lead to faster time-to-market and improved AI-driven products and services. By investing in vector database technologies early, VC firms can position themselves to capitalize on the increasing demand.

While the idea of vector databases is a decade old, most investments in startups in this space have only happened over the past three years. Some recent examples include:

· Zilliz, which in August 2022 received $60 million in Series B funding;[6]

· Pinecone, which in March 2022 received $28 million in Series A funding from, i.e., Tiger Global Management and Menlo Ventures;[7]

· Weaviate, which in April 2023 received $50 million in Series B funding from, i.e., Index Ventures, NEA, and Battery Ventures;[8]

· FirstBatch, which in October 2022 received Seed funding from, i.e., Coinbase.[9]

· Qdrant, which in April 2023 received $7.5 million in Seed funding.[10]

What does the future hold for vector databases?

The future of vector databases appears promising, with their significance in the AI landscape becoming increasingly apparent. During a time of VC and startup headwinds, the five abovementioned startups collectively raised $150 million from early-stage investors. As the demand for efficient handling of high-dimensional data in AI and machine learning applications grows, VCs are likely to continue investing in this burgeoning technology. By supporting the fundamental infrastructure that underpins AI development, VC firms can capitalize on the overall growth of the AI industry and contribute to the advancement of a technology that has the potential to revolutionize various sectors. As more organizations recognize the value of vector databases in driving AI innovation, we can expect a surge in VC investments aimed at fostering the development and adoption of these powerful tools, further solidifying their role in the future of AI and machine learning.

[1] Vector Database | Microsoft Learn (accessed April 22, 2023).

[2] What is a Vector Database? | Pinecone (accessed April 22, 2023); A Gentle Introduction to Vector Databases | Frank’s Ramblings (frankzliu.com) (accessed April 22, 2023).

[3] The vector database is a new kind of database for the AI era | VentureBeat (accessed April 23, 2023); Vector Database Use Cases — Qdrant (accessed April 22, 2023).

[4] Faiss: A library for efficient similarity search — Engineering at Meta (fb.com) (accessed April 23, 2023).

[5] Not All Vector Databases Are Made Equal | by Dmitry Kan | Towards Data Science (accessed April 23, 2023).

[6] Series B — Zilliz — 2022–08–24 — Crunchbase Funding Round Profile (accessed April 23, 2023).

[7] Series A — Pinecone — 2022–03–29 — Crunchbase Funding Round Profile (accessed April 23, 2023).

[8] Series B — Weaviate — 2023–04–20 — Crunchbase Funding Round Profile (accessed April 23, 2023).

[9] Pre Seed Round — FirstBatch — 2022–10–30 — Crunchbase Funding Round Profile (accessed April 23, 2023).

[10] Seed Round — Qdrant — 2023–04–19 — Crunchbase Funding Round Profile (accessed April 23, 2023).

--

--