Article 1: Introduction to Vector-Based Search

Shubham Barthwal
7 min readJun 30, 2023

This is my first article series which will be about vector-based search techniques. I got to explore this field while I was working on a project that involved a comparison of the products to find out if the product has variants in the product catalogue. Seems like a clustering problem but what really happens in clustering, we find the closest vectors which led to exploration in this field.

This series will cover: Introduction to Vector-Based Search, Vector Representation Techniques, Similarity Metrics, Algorithms and Models, Applications, and Optimization Techniques. No need to be overwhelmed reading all these points as I will break these points into basics for easier understanding as I can understand sometimes how hard it can be to understand these technical concepts. Let’s dive into the topic.

What is vector-based search?

To explain vector search, let’s think of an era when we had no “AI”. How did we even use it to find the articles from the large collection of data before AI🤔? I could say simply by matching the Keywords, Metadata, tags, or lexical similarity. But are these techniques fast enough, or even if it is fast, will they be accurate?

As a matter of fact, even though the methods are effective to some extent they often produced limited and imprecise results.

Here comes the saviour (not IronMan), Vector Based Search which uses Machine Learning techniques. I just dropped ML 💣 and now I have solved all the problems, I wish it was that simple. But never mind, that is what this article is all about and it will cover the topic as vastly as possible.

The idea of vector search is rooted in the semantic connections between objects. These connections are based on the meaning of the text and involve the study of the relationship between words, phrases, sentences, and their respective meanings. When used for search purposes, the approach seeks to comprehend the user's intention and locate the products or documents that most closely match their query.

To summarize, the Vector-Based Search uses the semantic relationship between the vectors which are derived from the text or image. These vectors can be numeric representations of the data. The methods and models around the vectors with basic nlp concepts will be discussed in future articles.

What is the need for vector-based search?

In the previous section, the information reflects vector-based search is used over the traditional search method. To understand the need for this search method, it is important to understand the limitations of the traditional search method which can be overcome using a vector-based method.

The traditional method generally involves keyword matching or exact matching, which relies on explicitly matching the query searched by the user to the indexed data. This indicates that there is no semantic understanding. Consider, an e-commerce site, would you like to do a hit and trial to find the right product for you? Instead, I prefer to use the terms that best describe the product and find the information on the first page. Here comes the “Hero⚔️”, the vector-based search can play an important role to understand the semantics of the query and find information that has similar semantics. So even if the keyword does not match, it would be possible that the semantics of the query is similar to the information available.

Image-based search: Traditional search methods face challenges when dealing with search based on image i.e. high dimensionality data. As the dimensions of data start to increment, the traditional search struggles as it becomes computationally expensive and less effective. On the other hand, vector-based search can efficiently handle these high-dimensional data. It can be used for finding similar images from any large data store. You might have Google Lens, it works on a similar concept. At least, it has made my life simple to find the name of that dish🍝that you see on any Instagram photo.

Understanding vector representations

Visualization to understand Vectorization

It is key for Semantic Analysis and Efficient Data Processing. You might have thought why do we even require these vector representations and what is actually in the vector representation? The simple answer is that our systems understand nothing but numerical representation but the real reason lies behind the working of vector-based search. As I have previously mentioned that this technique works on the concept of semantics. These vector representations are the mathematical representation of the data(Image or Text) points. These vectors capture the semantic meaning of the data points.

Vector representation has revolutionized semantic analysis in data processing by enabling systems to comprehend the meaning of words and sentences. One popular type of vector representation in NLP is word embedding, where words are linked to high-dimensional data based on their frequency. This method places words with similar meanings in close proximity to each other in the vector space, which can be regarded as their storage location.

There are various techniques such as word2vec, GloVe, and deep learning models that leverage large data sets and optimization algorithms to fine-tune vector values. These techniques aim to capture meaningful patterns and relationships in the data by utilizing statistical or neural networks. These models generate vectors that encode information about the semantic meaning of the data. While these terms may seem complex, don't worry - the upcoming article will provide a detailed explanation.

Benefits and limitations of vector-based search

Throughout this article, I have attempted to highlight the advantages of this method. However, it is important to acknowledge that there are also limitations. Before implementing this technique, it is crucial to carefully evaluate both its benefits and drawbacks.

Here are some of the benefits of this approach:

  • Semantic Understanding: This method captures the semantic relationship between the data points, enabling more accurate and context-aware search results.
  • Scalability: Vector search algorithms, such as approximate nearest neighbour search, can handle large-scale datasets with millions or billions of vectors. Since e-commerce sites have millions or even billions of data points, this method can be adapted to improve the overall user experience.
  • Efficient Approach: Vector representations allow for efficient similarity calculations, enabling fast retrieval of similar items or documents.
  • Handle Different Data Types: Vector-based search is adaptable to different types of data, including text, images, audio, or numerical data, providing a unified approach to search and retrieval.

Let’s talk about the limitation as this is something that needs to be considered before using this approach:

  • Loss of fine-grained details: Vector representations often simplify complex data, potentially leading to the loss of fine-grained details or nuances in the search results.
  • Interpretability: While this method provides efficient results, the interpretability of the factors that leads to the search results can be challenging, especially in deep learning models.
  • Dimensionality Curse: Even though this method can work on high dimensional data, it is commonly known in ML, how the increase in dimension can lead to the curse of dimensionality. In simple words, the efficiency and accuracy of vector-based search algorithms may degrade due to the curse of dimensionality.
  • Data quality and biases: These vector representations are learnt from training data which might contain bias or be unrepresentative which leads to bias or inaccuracies in the model/approach.

Some of these limitations can be handled using appropriate methods. So, before dropping the idea to use this approach, I would highly suggest looking around the methods that can overcome these limitations. I suggest this if the benefits can produce more favourable results.

Applications of vector-based search

So I have given a basic understanding of vector-based search with its need. I have also covered some understanding of vector representation with the merits and demerits of this approach. But it seems like even if it has merits and can be used to improve search, where can we really utilize this approach? In fact, it's just not a search, it has more applications and can be used to aid the business in solving their problems. I will be discussing some of the applications of vector-based search:

  • Information retrieval: It can be used for enabling efficient retrieval of documents or webpages based on similarity that can improve search engines’ accuracy and speed.
  • Recommendation systems: Vector representations facilitate personalized recommendations by finding similar items or user preferences, enhancing the effectiveness of recommendation algorithms.
  • Image and video search: Since this approach considers vectors as the basis of search, it can handle content-based image and video retrieval. This allows users to find visually similar images or videos. You might have used this in Google Lens.
  • E-commerce: Vector search helps in product search, matching customer preferences, and providing personalized recommendations, improving the shopping experience and conversion rates. This is something most important for e-commerce businesses to help the customer land the desired product with the least effort.

Just to add to the few examples, it can be used in fraud detection, anomaly detection, and NLP tasks like question-answering. The versatility and effectiveness of vector representations make them valuable in various domains where similarity-based retrieval, pattern recognition, and content understanding are crucial.

Thank you all for reading my article on vector-based search. As a new blogger, I appreciate your support and understanding as I strive to explain this complex topic. If you have any suggestions or spot any mistakes, please leave a comment to help me improve. Your feedback is invaluable as it motivates me to write more articles and explore new subjects. Let’s continue to foster a community of learning and sharing knowledge. Thank you once again for being a part of my journey.

--

--

Shubham Barthwal

Machine Learning Engineer passionate about problem-solving and optimization. Sharing insights on AI, ML, and cutting-edge technologies.