Exporting Elasticsearch results as CSV
Scenario: I have a lot of data indexed in an Elasticsearch instance and I want to export some of them in CSV format. It could be for many reasons but in my case I wanted to easily import them to R (well, there is an “elastic” package in R but I will talk about in another post).
Solution: There are several ways one can do this. I will use the Elasticsearch Data Format Plugin.
Steps:
- Install the Elasticsearch Data Format Plugin according to the version of the elasticsearch instance you have.
nandana@nandana-oeg:~/tools/elasticsearch-1.5.1/bin$ ./plugin — install org.codelibs/elasticsearch-dataformat/1.5.0
-> Installing org.codelibs/elasticsearch-dataformat/1.5.0…
Trying http://download.elasticsearch.org/org.codelibs/elasticsearch-dataformat/elasticsearch-dataformat-1.5.0.zip...
Trying http://search.maven.org/remotecontent?filepath=org/codelibs/elasticsearch-dataformat/1.5.0/elasticsearch-dataformat-1.5.0.zip...
Downloading ……… DONE
Installed org.codelibs/elasticsearch-dataformat/1.5.0 into ~/tools/elasticsearch-1.5.1/plugins/dataformat
2. Restart the ElasticSearch server ( probably there might be an easier way other way to start the plugin without restarting the server). You can check if the plugin is available using the following command.
nandana@nandana-oeg:~/tools/elasticsearch-1.5.1/bin$ curl -XGET http://localhost:9200/_cat/plugins
Nandana marvel 1.3.1 j/s /_plugin/marvel/
Nandana DataFormatPlugin 1.5.0 j
Nandana head NA s /_plugin/head/
3. Make a query to the _data endpoint and store the CSV output a file.
curl -o /path/to/file.csv -XGET “localhost:9200/{index}/{type}/_data?format=csv&source={source}”
{source} is the URL encoded query using the ElasticSearch query DSL. For example if I have a simple query such as { “query”: { “match_all”:{} } }, the concrete command will look like
curl -o /tmp/data.csv -XGET “localhost:9200/rindex/property/_data?format=csv&source=%7B+%22query%22%3A+%7B++%22match_all%22%3A%7B%7D+%7D+%7D”
Unsuccessful attempts:
My first attempt to export CSV from Elasticsearch was to use Logstash. It seems to fit well for the task; I could define a simple pipeline with that takes input from Elasticsearch and provide output as CSV. It had both an elasticseach input plugin and a csv output plugin.
However, it didn’t work. I created a very simple Logstach configuration but the elasticsearch input plugin failed with an strange error message. Not knowing how to go into to the ruby source and understand what’s going wrong, I had to give up.