Image by Author

Export an Entire Pandas DataFrame as a Document to Elasticsearch

using Espandas library

Chetana Didugu
Dec 12, 2020 · 3 min read

I have forever been in search of a method to export an entire Pandas DataFrame to Elasticsearch Client. I had tried my hand at it using the helper function in this post. However, a problem with this method is that, it is difficult to get back the entire DataFrame as an intact DataFrame. I mean, we can only extract one row at a time. This doesn’t bode well for tasks where you need to scan the entire database. So, I started looking for methods to get back the entire DataFrame, instead of tediously pulling each row individually.

I recently stumbled upon a library called espandas, which was a complete game-changer for me. It totally blew my mind! A huge shout-out to David Shuab for creating this library.

The Setup

First of all, for any of your Elasticsearch based solutions to work, you need to have it running in the background.

Make sure you have Elasticsearch downloaded on your system, and run it from your terminal by navigating to the folder where you have it installed. To run it type the following command:

\elasticsearch-6.8.0\bin> elasticsearch

That will get it up and running, and as a confirmation, you can see the following on your terminal:

Elasticsearch session

Now you are all set to go.

The Code

Once you have your elasticsearch running in the backend, import elasticsearch client and helpers into your notebook like so:

import numpy as np
import pandas as pd
from elasticsearch import Elasticsearch
from elasticsearch import helpers

Begin an elasticsearch session as follows:

es = Elasticsearch(http_compress=True)

The argument ‘http_compression’ enables users to compress HTTP traffic. I have enabled mine by setting it to TRUE. Now your session is active.

For this demo I have used the FAQs from the Uber website. I have done some web-scraping, and converted the text into questions and answers, and saved them in a DataFrame with two columns: ‘Question’ and ‘Answer’.

Uber FAQ

In order for Espandas to be able to do its job, you need to add a unique identifier column with the name indexId. Let us cretae this column:

data[‘indexId’]= 'ind’ + data.index.astype(str)

Now you are good to go. Now create an Espandas instance and push your data into Elasticsearch with just 2 lines of code:

INDEX = ‘uber_faq’
TYPE = ‘_doc’
esp = Espandas()
esp.es_write(data, INDEX, TYPE)

Keep in mind that in order to save a document in Elasticsearch, you need to indentify it with a unique INDEX and also specify the DOCUMENT TYPE. I have indexed my document with the name UBER_FAQ and given it a _DOC datatype (actually, I wasn’t really sure what document type I should mention, so I went with the default).

Congratulations!! Your DataFrame has been successfully ingested into Elasticsearch. If you want to verify/retrieve your saved data, use the es_read method. Here I am extracting the first 10 records from the dataframe:

k = [‘ind’+str(i) for i in 10]
res = esp.es_read(k, INDEX, TYPE)

This should give you:

Retrieved data

And that’s it! Now you can save Pandas DataFrames entirely as a single document and retrieve them whole as well!

The Startup

Get smarter at building your thing. Join The Startup’s +725K followers.

Chetana Didugu

Written by

Data Scientist, Polyglot and a Tabibito | |

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +725K followers.

Chetana Didugu

Written by

Data Scientist, Polyglot and a Tabibito | |

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +725K followers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store