Semantic Search with NLP

Published in

The Startup

7 min readFeb 27, 2020

Introduction

Natural Language Processing and Natural Language Understanding have evolved a lot in the past few years. Modern NLP techniques probably have started from word embeddings (Mikolov et. al) followed by GloVe (Global Vectors).

Embeddings, Transformers, Attention mechanisms are some of the recent developments in NLP that have become hugely successful. We have also seen BERT (Bidirectional Encoder Representations from Transformers) whose adoption is increasing since it has been released.

Huggingface (https://huggingface.co/) has been making notable contributions to the area of NLP where it is providing various NLP models available for use. (BERT, RoBERTa, AlBERT, etc.)

Objective

In this article, we will experiment with couple of approaches on how to apply modern NLP techniques (transformers) to context sensitive searches. First approach is to learn BERT embedding and monitor the efficiency of these in context of a sentence. Second approach is the one where we enhance the BERT embedding by using S-BERT (sentence BERT).

Why Context is required for Sentences

The architecture of transformer models is such that they represent word level embeddings, however some of the NLP tasks in industry requires are paragraph based or sentence based embeddings. Some of these tasks are

Clustering
Similarity Comparison
Information Retrieval

For example, If I am searching for an article related to the stock market and I am searching for ‘stocks which have gained’, but I have an article which talks about ‘shares which have moved up’. A literal search will have a hard time linking to the relevant article here.Clearly my results will not be useful though the information is right there.

Above is the reason why we would be interested in finding a mechanism which can search for a related paragraph/sentence given the phrase I am interested in. This is also known as Semantic Search. According to Wikipedia: “Semantic search denotes search with meaning, as distinguished from lexical search where the search engine looks for literal matches of the query words or variants of them, without understanding the overall meaning of the query.”

Sentence Embeddings

One way to address this is to use Sentence Embeddings.

“Sentence embedding is the collective name for a set of techniques in natural language processing (NLP) where sentences are mapped to vectors of real numbers.” (Source:Wikipedia)

In this article, We will talk mainly about how to get Sentence Embeddings and compare a couple of these.

Work have been done in past few years on these and some of the existing methods for achieving Sentence Embeddings are:

InferSent
Universal Sentence Encoder
Average GloVe Embeddings
Average BERT Embeddings
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Here, we will look at the implementation of the last two techniques listed above, “Average BERT Embeddings” and Sentence — BERT (SBERT) and try to get an idea of how these techniques perform and can be utilized.

Details of Experiment

The experiment which is conducted here is as follows. I have chosen twelve sentences as my corpus.This can be seen as the repository where I need to base my searches. Then I select three sentences for testing, and I ensure that meaning of the sentences are there in the corpus but they do not share the same words. E.g. “Cricket is my favourite game” is the test sentence and the closest match which is available in the corpus is “Sachin Tendulkar is a great player.”

Corpus

A man is eating food.
A man is eating a piece of bread.
The girl is carrying a baby.
A man is riding a horse.
A woman is playing violin.
Two men pushed carts through the woods.
A man is riding a white horse on an enclosed ground.
A monkey is playing drums.
A cheetah is running behind its prey.
Sachin Tendulkar is a great player.
Sholay is an Indian classic film
Dog is hunting for food

Test Sentences

Cricket is my favourite game.
I like hindi movies.
Cat is looking to eat.

BERT Embedding (Averaging)

Though BERT is not designed to generate meaningful sentence vectors, approaches like average pooling at various layers seems to work well for some tasks like Sentence Classification. In the experiment which I have conducted, this method does not appear to work well for Semantic searches.Nevertheless, this method looks to be quite popular. A popular implementation of this is available at https://github.com/hanxiao/bert-as-service

Experiment 1 (with Averaged BERT embeddings):

Experiment 1 has been conducted with a corpus of some text statements and some test statements. The intent was to find the best matches available in the supplied corpus. The flow of information from sentence to sentence embeddings is depicted as follows.

At the fourth step, which is shown as Model Output, Bert outputs a number of things. This includes various attention layers (12 or 24, depending upon the architecture), an output layer which corresponds to [CLS] token, also sometimes called the logits model. Various people prefer various strategies in going from Model Output to Sentence Embeddings. In this experiment, two approaches have been tried.

The code for this experiment is available at the following location.

Google Collab notebook for Bert Embeddings

As you can see, this scheme does not look to perform well. E.g, when ‘Cricket is my favourite game.’ is searched, we have two results coming up which are very close to the searched phrase (.6063 and .6003), but the best search is incorrect.(A monkey is playing drums.)

On the contrary, Bert does a fine job in finding contextual vectors, this is also illustrated in the notebook shared above as Experiment 1A. You will find here that embeddings for word ‘bank’ in the sentence “After stealing money from the bank vault, the bank robber was seen fishing on the Mississippi river bank.” are different on three occasions and embedding at position 1 and 2 are closer to each other. As BERT is designed to work on individual word vectors and associated embeddings, a direct approach where we sum or take the mean of vectors to get the sentence vectors may not be the right way and it is better if other approaches can be looked at.

Sentence-BERT or SBERT

A sentence BERT of SBERT takes the embeddings from a BERT Model and then this model is further trained with another corpus (AIINLI) and Siamese Network.

Experiments performed with this approach appear to give good results, this is also claimed in the SBERT paper. This method generates embeddings from a pretrained BERT model which is then modified with Siamese and Triplet network structures to generate meaningful Sentence Vectors.

Some tasks which are not suitable by BERT architecture are suitable with the above scheme,such as similarity comparison, clustering, and information retrieval via semantic search. SBERT is fine tuned over AllNLI dataset. SBERT can also be termed as a Transfer Learning which uses variants of BERT as a pretrained model.

Note: Results of various machine learning tasks are detailed in the SBERT Paper, available at https://arxiv.org/pdf/1908.10084.pdf which provides a comparative study of various networks.

Experiment 2 (with Averaged Sentence BERT):

Experiment 2 has been conducted with the same corpus as Experiment 1 and with same test sentences. The results are encouraging.

As you can see, this scheme appears to perform well. E.g, when ‘Cricket is my favourite game.’ is searched, ‘Sachin Tendulkar is a great player.’ is reported as the closest match.

Google Collab notebook for Sentence Embeddings

Conclusion

We started with the goal to see that modern NLP techniques like Transformers and their implementation like BERT can be used in tasks like information retrieval, semantic search and so on. It appears that a type of transfer learning where these pretrained models can be taken as base and some modification can work well. What is done here is a BERT model is taken and the embeddings were input to another model which provides sentence embeddings. As the area of NLP matures, we are likely to see adoption of more hybrid approaches (a variant of transfer learning)

APPENDIX

Transformers

Transformers are based on encoder decoder architecture and attention. This model is state of the art when it comes to NLP related tasks. More details are available in the paper Attention Is All You Need.

BERT

BERT (Devlin et al., 2018) is a pre-trained transformer network (Vaswani et al., 2017), which set for various NLP tasks new state-of-the-art results, including question answering, sentence classification, and sentence-pair regression.

BERT builds on top of a number of clever ideas, especially transformers. This can understand the context of the word by looking at the words which surround that word.This is also built on the idea of Semi Supervised learning. Semi Supervised Learning is at the heart of “Transfer Learning in NLP”.Google has also implemented BERT in its search engine queries.

Sentence-BERT

Sentence Embeddings using Siamese BERT-Networks (SBERT) https://arxiv.org/pdf/1908.10084.pdf

Siamese Network

A Siamese network consists of two identical neural networks, each taking one of the two input vectors. The last layers of the two networks are then fed to a loss function , which calculates the similarity between the two vectors. These networks have been very popular in Image recognition tasks but here we see that they are useful in NLP tasks as well.

AIINLI Network

The AllNLI dataset is the concatenation of the

SNLI dataset. (https://nlp.stanford.edu/projects/snli/ ) and the
MultiNLI dataset (https://www.nyu.edu/projects/bowman/multinli/).

A good resource on understanding BERT: BERT Word Embeddings Tutorial

Word Embeddings: https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf

Huggingface transformers https://transformer.huggingface.co/.

Global Vectors: https://nlp.stanford.edu/projects/glove/

Author Bio:

Ritesh Sinha is a Senior Data Scientist with Technology office, HCL Technologies. He has experience in delivering projects in the Manufacturing domain in the area of Data Science. He has keen interest in Deep Learning and has built and hosted models related to healthcare. He is a regular contributor on Kaggle and has earned the “Expert” rating on this platform. Currently he is researching NLP and related problems.