Exporting Elasticsearch results as CSV

Nandana Mihindukulasooriya
Technical Notes
Published in
2 min readJul 15, 2015

Scenario: I have a lot of data indexed in an Elasticsearch instance and I want to export some of them in CSV format. It could be for many reasons but in my case I wanted to easily import them to R (well, there is an “elastic” package in R but I will talk about in another post).

Solution: There are several ways one can do this. I will use the Elasticsearch Data Format Plugin.

Steps:

  1. Install the Elasticsearch Data Format Plugin according to the version of the elasticsearch instance you have.

nandana@nandana-oeg:~/tools/elasticsearch-1.5.1/bin$ ./plugin — install org.codelibs/elasticsearch-dataformat/1.5.0
-> Installing org.codelibs/elasticsearch-dataformat/1.5.0…
Trying http://download.elasticsearch.org/org.codelibs/elasticsearch-dataformat/elasticsearch-dataformat-1.5.0.zip...
Trying http://search.maven.org/remotecontent?filepath=org/codelibs/elasticsearch-dataformat/1.5.0/elasticsearch-dataformat-1.5.0.zip...
Downloading ……… DONE
Installed org.codelibs/elasticsearch-dataformat/1.5.0 into ~/tools/elasticsearch-1.5.1/plugins/dataformat

2. Restart the ElasticSearch server ( probably there might be an easier way other way to start the plugin without restarting the server). You can check if the plugin is available using the following command.

nandana@nandana-oeg:~/tools/elasticsearch-1.5.1/bin$ curl -XGET http://localhost:9200/_cat/plugins
Nandana marvel 1.3.1 j/s /_plugin/marvel/
Nandana DataFormatPlugin 1.5.0 j
Nandana head NA s /_plugin/head/

3. Make a query to the _data endpoint and store the CSV output a file.

curl -o /path/to/file.csv -XGET “localhost:9200/{index}/{type}/_data?format=csv&source={source}”

{source} is the URL encoded query using the ElasticSearch query DSL. For example if I have a simple query such as { “query”: { “match_all”:{} } }, the concrete command will look like

curl -o /tmp/data.csv -XGET “localhost:9200/rindex/property/_data?format=csv&source=%7B+%22query%22%3A+%7B++%22match_all%22%3A%7B%7D+%7D+%7D”

Unsuccessful attempts:

My first attempt to export CSV from Elasticsearch was to use Logstash. It seems to fit well for the task; I could define a simple pipeline with that takes input from Elasticsearch and provide output as CSV. It had both an elasticseach input plugin and a csv output plugin.

However, it didn’t work. I created a very simple Logstach configuration but the elasticsearch input plugin failed with an strange error message. Not knowing how to go into to the ruby source and understand what’s going wrong, I had to give up.

--

--

Nandana Mihindukulasooriya
Technical Notes

IBM Research AI / Apache / Erasmus+ #AI #LinkedData #SemanticWeb All opinions are mine and may not reflect the opinions of organizations belong to!