A peek inside how Snowflake’s new Universal Search feature was built

Now that Neeva and Snowflake have joined forces, we are leveraging Neeva’s powerful search technology to search over anything in Snowflake. Built on Snowflake Cortex, Universal Search (private preview) allows Snowflake customers to use search functionality to find feature documentation, discover valuable datasets in our marketplace, and even locate tables and views stored in their Snowflake instance! Let’s see what happens in that last use case when we apply a web search technology to Snowflake artifacts.

Universal Search interface

Our search algorithm combines various signals to produce a ranking. For the purposes of this article, we’ll focus exclusively on one of these signals, the embedding — generated with a bi-encoder model. This model converts table names into a numerical representation called an embedding (or a vector), which we can use to semantically rank tables relative to a text query.

Our goal for our ranking task is to display results that are most relevant to an issued query. To do so, we create a golden set and formalize a metric relevant to our search problem. The golden set consists of a uniform distribution of head (one word, navigational type) and tail (longer, natural language type) queries and their corresponding expected results. For example, a golden set might look like this:

These synthetic examples illustrate the shape of a golden dataset for generating rankings. In the real world, the dataset includes multiple head and tail queries with their respective results ordered by relevance. A query can have one or multiple relevant results depending on what is in the index.

We iterate on 2 main metrics: a retrieval and a ranking metric.

  1. The retrieval metric lets us know how many golden results are actually retrieved relative to the number of the golden results currently in the index. Therefore, if we have 80% of the golden results in our index, the highest the retrieval metric can be is 80%. We have 2 retrieval sources: keyword (based on term hits between the result and the query) and vector (based on the similarity between the query and result embedding).
  2. The ranking metric quantifies how many golden results retrieved are in the top k actual results returned.

Our initial approach using a search algorithm to evaluate a golden dataset representing Snowflake artifacts did not perform well on our vector retrieval metric. This means that our embedding logic didn’t retrieve relevant results, and therefore, our customers would have been given low quality results. How could we solve this?

I suspected that the issue stemmed from using a fine-tuned bi-encoder not well suited for table and view results. Web search data doesn’t resemble tables! We therefore embarked on a series of experiments. These experiments involved altering the base model for fine-tuning and adjusting our training set’s data distribution. Some base models we experimented fine-tuning on included our current websearch bi-encoder model, the MiniLM L6 model, and the E5 base model. Ideally, Universal Search returns results for any type of query, from full sentences to keywords to queries with specific table names. Therefore, during fine-tuning, we varied our data distribution to represent the different types of queries we expected Universal Search might encounter.

After running all of our experiments, we chose 2 models that performed the best on our test set and ran our final evaluation to calculate our vector retrieval metric. These models were both trained with an even distribution of query types. One model used the websearch bi-encoder model as the base and the other used the MiniLM model as the base. We found that our fine-tuned MiniLM model performed better! Specifically, we saw a gain of about 20% on our vector retrieval metric 🚀

The vector retrieval score of our original model and the two best-performing training experiments. Because the vector retrieval score was the highest for the MiniLM model, we decided to replace the original bi-encoder model with this one, greatly improving our search performance.

Iterating on our embedding signals is just one of the many search efforts here at Snowflake. We are constantly refining our search stack so our users can have the best experience possible.

--

--