ElasticBERT: Information Retrieval using BERT and ElasticSearch

Kelvin Jose
Analytics Vidhya
Published in
2 min readFeb 9, 2020

--

What I built is a simple Information Retrieval system using pretrained BERT model and elasticsearch. Recently elasticsearch announced text similarity search with vectors in this post. We convert text into a fixed length vector which would be saved into an elasticsearch index. Then we use cosine similarity metric to figure out the most similar content out of the index. This is the overall workflow of the system.

I just had to connect together a couple of things to make it work. I will explain how did I manage to do it.

I created a minimal corpus manually by copying some titles and abstracts from researchgate.net. You can see a file named example.csv inside the elastic/ folder. I would use a search term against the corpus to retrieve the most similar abstracts from the index. Basically it does the job of a search engine.

1. Download pre - trained BERT model.

wget https://storage.googleapis.com/bert_models/2018_10_18/cased_L-12_H-768_A-12.zipunzip cased_L-12_H-768_A-12.zipcp cased_L-12_H-768_A-12 bert/model

2. Setup BERT docker.

--

--