ElasticBERT: Information Retrieval using BERT and ElasticSearch

Kelvin Jose
Feb 9, 2020 · 2 min read

What I built is a simple Information Retrieval system using pretrained BERT model and elasticsearch. Recently elasticsearch announced text similarity search with vectors in this post. We convert text into a fixed length vector which would be saved into an elasticsearch index. Then we use cosine similarity metric to figure out the most similar content out of the index. This is the overall workflow of the system.

I just had to connect together a couple of things to make it work. I will explain how did I manage to do it.

I created a minimal corpus manually by copying some titles and abstracts from researchgate.net. You can see a file named example.csv inside the elastic/ folder. I would use a search term against the corpus to retrieve the most similar abstracts from the index. Basically it does the job of a search engine.

1. Download pre - trained BERT model.

2. Setup BERT docker.

3. Setup elasticsearch docker.

4. Start docker containers.

  • Make sure the dockers are up and running by

5. Install dependencies.

6. Create elasticsearch index

  • create_index.py script creates an index in elasticsearch
  • --index and --config arguments specify the name of the elasticsearch index and schema of the target index, respectively.
  • You can verify the index by checking http://127.0.0.1:9200/researchgate

7. Create documents

  • This script creates an example.json1 file in the elasticsearch prescribed format which in-turn to be indexed later.

8. Create indexes

9. Test the engine.

  • It will return the top match to the target query. It’s hard coded as “machine learning” in line number 13 and you are free to change it.

I hope this can be scaled up to serve millions of records in blazing speed. All we wanted to do is, build a large corpus and add more configurations to elasticsearch schema such as clusters and shards.

Have fun guys.

Peace Power Pleasure

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Kelvin Jose

Written by

➜ about.me/kelvinjose

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Kelvin Jose

Written by

➜ about.me/kelvinjose

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store