Kickstart Your GenAI Applications with Milvus Lite and WhyHow.AI’s open-source rule-based-retrieval

Chia Jeng Yang
Published in
7 min readMay 31, 2024

We’re excited to partner with Milvus to bring you Milvus Lite, a newly available, lightweight, and easy-to-install vector database. With a simple “pip install”, you can start building GenAI applications with vector search and run them anywhere — on laptops, Jupyter Notebooks, and edge devices. Milvus Lite is integrated with WhyHow.AI’s open-source Rule-based Retrieval, making the development of GenAI applications easier than ever.

“Contributing more powerful, granular tooling for the community to get more precise and accurate retrieval is what WhyHow.AI is here for. We aim to let developers stand up production RAG systems faster and with less hassle.” — Tom Smoker, Co-Founder at WhyHow.AI

Simplify RAG with Milvus Lite and WhyHow.AI’s open-source Rule-based Retrieval package

Milvus is a leading vector database known for its superior performance and scalability, now easier to get started with Milvus Lite. We aim to extend its capabilities with WhyHow.AI’s graph tooling ecosystem to more GenAI builders who are early in their development journey. Milvus Lite can be easily installed via one line of pip command, saving the hassle of setting up Docker so that you can start building within seconds. Sharing the same API with more scalable Milvus on Kubernetes or Zilliz Cloud, you only need to write the client-side code once and your app can scale from a prototype to production with billions of vectors.

Developers frequently tell us that they know exactly where to find the answer to a question within their raw data, but for some reason, their RAG solution is not pulling in the right chunks. This is a frustrating problem that is especially challenging to fix given the black-box nature of retrieval and LLM response generation.

Although vector similarity searches return highly relevant data, developers must still contend with LLM hallucinations and their RAG systems sometimes failing to return the data that is most relevant to the problem they are trying to solve. Perhaps a user’s query is poorly phrased and it yields bad results from a vector database, as the index may store a lot of semantically similar data. Or maybe you want to include response data that is semantically dissimilar to the query embedding, but is still contextually relevant to building a complete, well-rounded response to a particular user query.

In these cases, it helps to have more determinism and control in the chunks of raw data that you are retrieving in your RAG pipeline. Thus, we developed a rule-based retrieval solution whereby developers can define rules and map them to a set of chunks they care about, giving them more control in their retrieval workflow.

WhyHow.AI and Milvus Lite together simplify the development of RAG apps. Here’s how our integrated solution makes it happen:


Imagine you’re a legal professional at an investment fund, and you’re trying to understand the rights and obligations of your various investors. To understand this, you want to perform RAG on two key documents, a limited partnership agreement (LPA) which details general terms of the partnership between you and your investors, and a side letter which details additional investor rights and privileges not mentioned in the LPA.

When you search ‘what are the rights of investor 1?,’ your RAG system returns some relevant results, but it also returns raw data from other investor side letters and from irrelevant pages of the LPA, causing the LLM to hallucinate and return an inaccurate response.

INFO:querying:Index whyhow exists
INFO:whyhow.rag:Raw rules: []
INFO:querying:Answer: The provided contexts do not contain specific
information about the rights of client 3.

To fix this, you can write rules to mimic the workflow that domain experts typically follow for this use case. First, you want to check the specific rights and privileges that have been explicitly granted to the investor in the side letter, which are specifically mentioned on page 2. Then, you want to check LPA pages that pertain to the specific investor rights you care about. In this case, page 41 of the LPA.

INFO:querying:Index whyhow exists

INFO:whyhow.rag:Raw rules: [Rule(filename='side_letter_client_3.pdf',
uuid=None, page_numbers=[1], keywords=[]),
Rule(filename='LPA.pdf', uuid=None, page_numbers=[41],

INFO:querying:Answer: Access to key financial and operational data
related to their investments, access to a list of the Partnership
Investments valued at fair value, extended reporting, advisory

When we add these rules, we perform queries that are scoped in on a much narrower set of chunks which increases the likelihood of relevant data to generate an accurate response. With some more tuning of the prompt and query, we can continue to improve the output generated by the SDK.

It may not be necessary to create a rule for every extraction. Rules can simply be implemented for particularly tricky questions, or questions which simply seem to be facing failed retrievals. However, some of our design partners have implemented rule extraction for all questions, even simple ones, for the peace of mind that a deterministic system in production brings

See It in Action

How does it work? For more detailed information of how the package works, check out the Github repo here or a longer article about how it works here.

The rule-based retrieval SDK does a few things for the user:

Index & namespace creation — the SDK creates a vector database index and namespace on your behalf. This is where chunk embeddings will be stored.

Splitting, chunking, and embedding — when you upload pdf documents, the SDK will automatically split, chunk, and create embeddings of the document before upserting into the vector databaseindex.

We’re leveraging Langchain’s PyPDFLoader and RecursiveCharacterTextSplitter for pdf processing, metadata extraction, and chunking. For embedding, we’re using the OpenAI text-embedding-3-small model.

Auto-filtering — using a set of rules defined by the user, we automatically build a metadata filter to narrow the query being run against the vector database index

When you upload your documents, the PDFs are automatically split and chunked, and high-level document information is extracted and added to the chunk in the form of vector metadata (chunk text, page number, sequential chunk ids, and document filename).

Now that this information is part of the Miluvs vector, we can use metadata filters to run pointed queries against these vectors. We define metadata filter rules by specifying the filename and the page numbers we want to include in a given query, and add them to a list to be passed into the client. This abstraction is meant to offer a simple, intuitive way of building/grouping rules to be managed, and applied to different types of queries.

We allow you to add optional keywords which will automatically trigger the rule if any of the keywords are detected in the question, as long as keyword_trigger is set to ‘true.’ In the code snippet below, if the question is ‘what are the rights of client 1?,’ both of the rules will be triggered, and the filters will be applied. If keyword_trigger is set to ‘false,’ then all of the rules specified will be applied by default. This is a very simple application of managing and applying keyword detection/triggering, but you can easily extend this type of rule automation using the semantic reasoning capabilities provided by LLMs and supporting solutions like knowledge graphs.

rules = [
keywords=[‘rights of client 1’]
keywords=[‘rights’,’client 1’, ‘rights of client 1’]

When the query is run, the client will generate an embedding for the question text and run a vector similarity search using the embedding and the filters. Depending on whether or not you have enabled include_all_rules, the filters will be applied in one of two ways.

If ‘false,’ all the filters will be concatenated into a single filter which will be applied to a single Miluvs vector search. In this case, you’re asking Miluvs, “find me the most semantically relevant chunks given that they meet one of the conditions defined in this filter.”

{'$or': [
{'$and': [
{'filename': {
'$eq': 'LPA.pdf'
{'page_number': {
'$in': [41,42,43]
{'$and': [
{'filename': {
'$eq': 'side_letter_investor_1.pdf'
{'page_number': {
'$in': [2]

Now, although you may have asked Miluvs to retrieve information from a certain set of pages, depending on the number of rules you have, the results of the vector similarity search, and the size of your top_k, it is possible that your vector similarity search may not return results from some of the pages you’ve specified. But what if you want to ensure that data from all pages are sent to the LLM?

If include_all_rules is set to ‘true,’ a separate Miluvs query will be run for each filter. In this case, for each rule you’ve built you’re asking Miluvs, “find me the most semantically relevant chunks on these pages…done?…cool…now do it for the next set of pages. And so on.” Instead of concatenating all the rules into a single filter, we’re running multiple vector similarity searches (one per rule) and concatenating all the matches into a single output to be sent to the LLM. With this strategy, we can guarantee the results will include information from each set of pages in your rule set, however the downside is that you may be pulling in less semantically relevant information than if you had just queried the entire vector database using the concatenated filters.

if include_all_rules:
texts = []

for filter in filters:
query_response = index.query(

Depending on the number of rules you use in your query, you may return more chunks than your LLM’s context window can handle. Be mindful of your model’s token limits and adjust your top_k and rule count accordingly.

Getting Started Now

Want to know more about what Milvus Lite and WhyHow.AI can do for your projects? Check out our documentation. Start building GenAI applications with Milvus Lite and WhyHow.AI today!

WhyHow.AI is building tools to help developers bring more determinism and control to their RAG pipelines using graph structures. If you’re thinking about, in the process of, or have already incorporated knowledge graphs in RAG for accuracy, memory and determinism, we’d love to chat at, or follow our newsletter at WhyHow.AI. Join our discussions about rules, determinism and knowledge graphs in RAG on our Discord.