ElasticBERT: Information Retrieval using BERT and ElasticSearch

Published in

Analytics Vidhya

2 min readFeb 9, 2020

What I built is a simple Information Retrieval system using pretrained BERT model and elasticsearch. Recently elasticsearch announced text similarity search with vectors in this post. We convert text into a fixed length vector which would be saved into an elasticsearch index. Then we use cosine similarity metric to figure out the most similar content out of the index. This is the overall workflow of the system.

I just had to connect together a couple of things to make it work. I will explain how did I manage to do it.

Clone the repo https://github.com/kelvin-jose/elasticbert

I created a minimal corpus manually by copying some titles and abstracts from researchgate.net. You can see a file named example.csv inside the elastic/ folder. I would use a search term against the corpus to retrieve the most similar abstracts from the index. Basically it does the job of a search engine.

1. Download pre - trained BERT model.

wget https://storage.googleapis.com/bert_models/2018_10_18/cased_L-12_H-768_A-12.zipunzip cased_L-12_H-768_A-12.zipcp cased_L-12_H-768_A-12 bert/model

ElasticBERT: Information Retrieval using BERT and ElasticSearch

1. Download pre - trained BERT model.

2. Setup BERT docker.

Written by Kelvin Jose