Loading wiki dumps into elastic search

Rohan Singh
1 min readDec 7, 2018

--

Recently, I started to get my hands dirty with elasticsearch. The first step was to load information. I tried following the instructions provided here to load wikipedia search index for testing. Unfortunately, the instructions do not work as expected. I made a few changes to the existing instructions and thought to share it with a wider audience to help explore this amazing feat of engineering!

Step 1: Download

You can download any wiki dump from here. The following command downloads the english wikiquote dump.

Step 2: Get the index ready

You’ll need the analysis-icu plugin and jq for this step.

To install the analysis-icu, run the following command —

sudo bin/elasticsearch-plugin install analysis-icu

To install jq, you can use the following command on mac —

brew install jq

To create the index, you can run the following script.

Step 3: Get the wiki ready for loading

Step 4: Load the Wiki

viola! The wiki should now be loaded! I strongly suggest to follow the technical details in the original blog.

To verify, you can run the following command to search for Einstein quotes.

Hope this helps all the elasticsearch enthusiasts.

--

--