Keyword search is DEAD; Semantic Search is Smart

Published in

Analytics Vidhya

3 min readMar 19, 2020

Suppose you have 100 pages documents and you want to know the statements regarding “restaurant”. Traditionally people do keyword search in the document, then they can miss many important statements where café, bar, eatery, eating-place, diner are mentioned. As an outcome, you get the poor result in keyword search.

If you do semantic search, you can get all information as per your intent.

Semantic search attempts to improve search accuracy by understanding the searcher’s intent and the contextual meaning of terms as they appear in the data to generate more relevant results

Semantic search is when a whole lot of resources are used in order to perform a search, rather than just keywords.

Semantic search describes a search engine’s attempt to generate the most accurate results possible by understanding:

· Searcher intent

· Query context

· The relationships between words

In layman’s terms, semantic search seeks to understand natural language the way a human would. We can search content for its meaning in addition to keywords and maximize the chances the user will find the information they are looking for.

The purpose of semantic search is to go beyond the static dictionary meaning of a word or phrase to understand the intent of a searcher’s query within a specific context. By learning from past results and creating links between entities, a search engine might then be able to deduce the answer to a searcher’s query, rather than provide ten blue links that may or may not provide the correct answer.

For identifying best results, web search relies on characteristics of web pages (features) such as the number of keywords matches between the query and the web page’s title, URL, body text and so on. defined by engineers (just an example). During run time, these features are fed into classical machine learning models like gradient boosted trees to rank web pages. Each query-web page pair becomes the fundamental unit of ranking

Deep learning improves this process by allowing us to automatically generate additional features that more comprehensively capture the intent of the query and the characteristics of a webpage.

Specifically, unlike human-defined and term-based matching features, these new features learned by deep learning models can better capture the meaning of phrases missed by traditional keyword matching. This improved understanding of natural language (that is, semantic understanding) is inferred from end-user clicks on webpages for a search query. The deep learning features represent each text-based query and webpage as a string of numbers known as the query vector and document vector, respectively.

These query and web page vectors are fed into a deep model known as the convolutional neural network (CNN). This model improves semantic understanding and generalization between query and webpages and calculates the output score for ranking. It is achieved by measuring the similarity between a query and a webpage’s title, URL and so on, using a distance metric, for example, cosine similarity.

Given huge documents, news articles or novels, word2vec or glove or BERT model creates vectors for each word. We already have pre-trained models and hence defined vectors for all words, in 300 dimensions. Based on cosine similarity, words can be called similar. As every word becomes a vector. Words having similar vectors are similar words. So, “restaurant” is found similar to “café”, “eatery” based on vector similarity.

Phrases can be represented as vectors, based on element-wise sum or any other approach. Even sentences can be represented as vectors.

Hope you understand Semantic Search.

Above article is taken from the book : GAN with Industrial Use cases

GAN with Industrial Use cases (US market)

Keyword search is DEAD; Semantic Search is Smart

Written by Navin Manaswi