Key-Value Databases are Sufficient Infrastructure for Semantic Search at Scale with NeuralDB.

Published in

ThirdAI Blog

4 min readFeb 10, 2024

Search forms the foundation of data storage and management systems. The effectiveness and scalability of search play a crucial role in the utility of modern data lakes and warehouses. Key-value databases are popular due to their simple designs, providing scalability and flexibility for various data types, as detailed in [Link1, Link2]. These infrastructures have undergone decades of engineering and compliance, making them stand out across various applications. With GenAI’s recent success, databases are planning a fundamental design transformation to support natural language-based unstructured queries on complex, multi-modal data.

The Holy Grail: Efficiency with Complete Data Residency

Integrating AI and a new ‘vector database’ into the traditional stack introduces complications, requiring a major software stack redesign to support vector search at scale. Supporting AI and other embedding models at scale is a major concern. Generating embedding requires moving all the data around out of the core database environment to other outside managed services, or micro services hosted in GPU environment. Data often moves around different services. Any significant modification to the fundamental design inevitably takes a long time to mature, be tested, vetted, and optimized, not to mention the associated substantial costs.

Fortunately, with NeuralDB’s AI-First approach to semantic retrieval, we find that the existing key-value database ecosystem is already sufficient. As a result, we eliminate the need for an infrastructure overhaul to support embedding and vector databases guaranteeing complete data residency. We can capitalize on decades of optimization in the existing setup to facilitate semantic search over unstructured data at scale, starting today!

**Top Panel**: Pipeline for Existing Semantic Search with Embedding and VectorDB. Data Residency hard to ensure due to heavy movement of data to different services. **Bottom Panel**: NeuralDB and traditional key-value store for semantic search with complete data residency.

The Unifying View: Everything Simplified into Key-Value Pairs

In a previous blog post [Link], we discussed how embedding and vector databases contribute to semantic retrieval. At a high level, the process involves converting a text sentence or ‘chunk’ [TC], such as “Museums of Paris,” into a vector [V] using popular embedding models like OpenAI-Ada. This vector [V] is then passed to a vector database, possibly through a graph traversal (HNSW-style graph data structure), leading to a set of KEYs pointing to relevant text like “Louvre,” “Musée d’Orsay”, etc., in the vector database. Multiple KEYs can point to the same text chunk (and vice-versa), and we retrieve and rank multiple candidates from VectorDBs.

Ultimately, the goal is to compute KEYs from the Text Chunk [TC], where the KEYs point to various relevant text pieces. If we follow the current pipeline, specialized infrastructure is needed to go from Text to KEYs, which is the challenging part, involving AI embedding and complex graph traversal to finally reach the reference to the desired text. After all, everything is a data structure.

Enter NeuralDB: Augment the database with a CPU-only Language Model (LLM) that generates “semantic keys.”

The optimal solution for retrieval is to create a specialized AI or Large Language Model (LLM) explicitly designed to process Text Chunk [TC] and directly generate KEYs. LLMs excel in generating intricate language and can be tailored for the specific purpose of generating KEYs, ensuring that relevant text shares most of these KEYs, possibly in a weighted manner. Developing such LLMs eliminates the necessity for vector databases and embedding altogether! This AI is meticulously crafted for retrieval, adhering to an AI-first approach.

The idea of “learning to index,” as outlined in a prominent database paper [Link], is rooted in a similar observation. Our founding team’s sequence of papers [NeurIPS 2019, ICLR 2022, and KDD 2022] demonstrates the superiority of such an index, significantly reducing the memory footprint compared to embedding-based approaches. The advantage of not needing to store any embedding and its associated bulky HNSW index represents a substantial leap forward in terms of performance and scalability. We describe the cost and memory comparison in the next blog.

The Technological Disruption: Build and Deploy LLMs to Generate KEYs on Existing CPUs

ThirdAI has consistently emphasized the significance of training, deploying, and refining AI on CPUs. The BOLT software library by ThirdAI, a purely software-based solution, enables the complete AI/LLM lifecycle on standard CPUs. Within this framework, NeuralDB specializes the BOLT library to efficiently convert a “text chunk” into discrete KEYs using a large language model for indexing purposes. Clearly, this can be integrated with any Key-value database without compromising on data compliance and other constraints.

NeuralDB takes on the heavy lifting by managing and automating the entire lifecycle of Large Language Models (LLMs). This encompasses pre-training on CPUs for zero-shot text understanding and extends to deployment, ensuring load balancing to prevent overcrowding of keys. Overcrowding can adversely impact search performance, making load balancing a crucial aspect.

NeuralDB features numerous proprietary built-in enhancements honed over years to improve retrieval efficiency. In essence, ThirdAI’s NeuralDB library signifies a technological breakthrough, simplifying the conversion of “text chunks” into keys. All computations take place on standard CPUs and scale horizontally with the number of CPU machines and cores. Consequently, the solution effortlessly scales within any database built on distributed CPUs.

NeuralDB enhances existing semantic search ecosystem

VectorDBs aren’t the sole search strategy; keyword-based (Lucene) search is integral to almost every database. Additionally, modern search systems are hybrids, incorporating various candidate generators, followed by constraint filtering and ranking. NeuralDB improves the ecosystem with its unique advantages as a perpetual fine-tunable AI-first search system, similar to Google Search, which continuously improve with usage. This unlocks domain specialization and hyper-personalization while maintaining scale, efficiency, and data residency. Discover more about NeuralDB Enterprise [Here].