How to Select Properties to Return in Weaviate BM25

Asaki Sakamoto
7 min readSep 25, 2024

--

weaviate bm25 select properties to return

Let’s talk about something that we all face during development: API Testing with Postman for your Development Team.

Yeah, I’ve heard of it as well, Postman is getting worse year by year, but, you are working as a team and you need some collaboration tools for your development process, right? So you paid Postman Enterprise for…. $49/month.

Now I am telling you: You Don’t Have to:

That’s right, APIDog gives you all the features that comes with Postman paid version, at a fraction of the cost. Migration has been so easily that you only need to click a few buttons, and APIDog will do everything for you.

APIDog has a comprehensive, easy to use GUI that makes you spend no time to get started working (If you have migrated from Postman). It’s elegant, collaborate, easy to use, with Dark Mode too!

Want a Good Alternative to Postman? APIDog is definitely worth a shot. But if you are the Tech Lead of a Dev Team that really want to dump Postman for something Better, and Cheaper, Check out APIDog!

Understanding Weaviate and Its BM25 Algorithm

Weaviate is an open-source, vector search engine that allows users to build semantic search applications. It can handle a variety of data types, making it a suitable choice for searching unstructured data such as text and multimedia. One of the core search algorithms used in Weaviate is BM25, which is a probabilistic model based on term frequency-inverse document frequency (TF-IDF). This model forms the backbone of document retrieval, calculating relevance scores to return the most pertinent documents based on the user’s query.

BM25 works on the notion of document length normalization and takes into account the frequency of terms within a specific document versus their frequency across all documents in the dataset. In Weaviate, BM25 is particularly effective when combined with vector-based search, enabling it to leverage both semantic understanding and traditional keyword matching. Understanding how to efficiently use BM25 in Weaviate, particularly in selecting properties to return with queries, is vital for leveraging its full potential.

The Importance of Selecting Properties

When querying a database, it is often necessary to specify which properties or fields you wish to return in the result set. In Weaviate, this is especially important when working with large datasets that may contain numerous attributes. Returning only the relevant properties can improve performance, reduce bandwidth, and enhance the user experience by delivering concise and focused results.

Selecting properties also has the benefit of structuring your data output in a way that makes the information easier to work with. For example, if you’re developing an application that displays search results to users, you might only want to return the title, summary, and URL of documents, rather than the entire content of each document. This selection process can be handled seamlessly in Weaviate using its powerful query language.

Step-by-Step Guide to Selecting Properties in Weaviate with BM25

To effectively use BM25 in Weaviate while selecting which properties to return, follow this step-by-step guide.

Step 1: Setting Up Weaviate

Before you can run any queries, you need to ensure that you have a running instance of Weaviate. You can deploy Weaviate using Docker, which encapsulates everything in a container to allow for easy setup.

docker run -d -p 8080:8080 semitechnologies/weaviate:latest

After this command, Weaviate should be accessible at http://localhost:8080.

Step 2: Create a Schema

In Weaviate, all interactions are structured around a schema that describes the classes and properties of your data. For instance, if you’re searching through a collection of articles, your schema might look like this:

{
"classes": [
{
"class": "Article",
"properties": [
{
"dataType": ["string"],
"name": "title"
},
{
"dataType": ["text"],
"name": "content"
},
{
"dataType": ["string"],
"name": "url"
},
{
"dataType": ["string[]"],
"name": "tags"
}
]
}
]
}

You can create this schema using a single HTTP POST request to Weaviate’s /schema endpoint.

Step 3: Importing Data

Once the schema is set, the next step is to populate your Weaviate instance with data. This can be done using the /batch endpoint to send multiple objects at once. For instance, following our Article schema, the import might look like this:

[
{
"title": "Understanding BM25",
"content": "BM25 is one of the most effective algorithms ...",
"url": "https://example.com/bm25",
"tags": ["BM25", "search", "algorithm"]
},
{
"title": "Exploring Semantic Search",
"content": "Semantic search employs advanced algorithms ...",
"url": "https://example.com/semantic-search",
"tags": ["semantic", "search", "AI"]
}
]

Step 4: Running BM25 Queries with Selected Properties

After data importation, it’s time to perform queries. Weaviate allows you to execute BM25 queries while specifying which properties you want to return. The syntax for such a query is straightforward.

For example, to search for articles containing the term BM25 only returning the title and url, you can use the following GraphQL query:

{
Get {
Article(nearText: {
concepts: ["BM25"]
certainty: 0.7
}) {
title
url
}
}
}

This search will return only the specified properties of documents that match the query criteria with a BM25 scoring mechanism in action.

Step 5: Understanding the Query Response

When executing your queries, it’s essential to understand the structure of the response you receive. The returned JSON structure will encapsulate your data based on the specified properties. For example, the response from the query might look like this:

{
"data": {
"Get": {
"Article": [
{
"title": "Understanding BM25",
"url": "https://example.com/bm25"
}
]
}
}
}

This compact output is more efficient for your application to handle than returning the complete article content.

Step 6: Fine-tuning Your Queries

Weaviate also allows you to fine-tune BM25 parameters for better relevance and retrieval accuracy. You may adjust the parameters such as k1 and b, which control term saturation and document length normalization, respectively.

In practice, you can modify these parameters directly within your query options. For example, you might want to optimize for a specific dataset:

{
Get {
Article(nearText: {
concepts: ["BM25"]
certainty: 0.7
k1: 1.5
b: 0.75
}) {
title
url
}
}
}

Step 7: Best Practices for Using BM25 with Selected Properties

  1. Limit the Number of Properties: Always return only what is necessary for your application. This maintains efficiency and increases performance.
  2. Experiment with Parameters: Utilize the tuning capabilities of BM25 to determine the best settings for your unique dataset and retrieval needs.
  3. Monitor Query Performance: Measure the performance of different queries, especially as your dataset scales, and adjust your schema and queries accordingly.
  4. Use Filters: Combine BM25 with other filtering criteria in your queries to narrow down results based on other properties like tags or dates.
  5. Leverage Pagination: For extensive datasets, consider implementing pagination in your queries to manage the volume of data returned.

Conclusion

In leveraging Weaviate’s capabilities using the BM25 algorithm, you can create efficient, responsive search applications that return data selectively. Understanding the structure of your schema, how to effectively send queries, and the best practices for returning properties ensures that your application achieves its goals while also being scalable and performant. As you grow familiar with these concepts, the possibilities for building intelligent search systems become increasingly expansive.

Let’s talk about something that we all face during development: API Testing with Postman for your Development Team.

Yeah, I’ve heard of it as well, Postman is getting worse year by year, but, you are working as a team and you need some collaboration tools for your development process, right? So you paid Postman Enterprise for…. $49/month.

Now I am telling you: You Don’t Have to:

That’s right, APIDog gives you all the features that comes with Postman paid version, at a fraction of the cost. Migration has been so easily that you only need to click a few buttons, and APIDog will do everything for you.

APIDog has a comprehensive, easy to use GUI that makes you spend no time to get started working (If you have migrated from Postman). It’s elegant, collaborate, easy to use, with Dark Mode too!

Want a Good Alternative to Postman? APIDog is definitely worth a shot. But if you are the Tech Lead of a Dev Team that really want to dump Postman for something Better, and Cheaper, Check out APIDog!

--

--