Loading wiki dumps into elastic search
Recently, I started to get my hands dirty with elasticsearch. The first step was to load information. I tried following the instructions provided here to load wikipedia search index for testing. Unfortunately, the instructions do not work as expected. I made a few changes to the existing instructions and thought to share it with a wider audience to help explore this amazing feat of engineering!
Step 1: Download
You can download any wiki dump from here. The following command downloads the english wikiquote dump.
Step 2: Get the index ready
You’ll need the analysis-icu plugin and jq for this step.
To install the analysis-icu, run the following command —
sudo bin/elasticsearch-plugin install analysis-icu
To install jq, you can use the following command on mac —
brew install jq
To create the index, you can run the following script.
Step 3: Get the wiki ready for loading
Step 4: Load the Wiki
viola! The wiki should now be loaded! I strongly suggest to follow the technical details in the original blog.
To verify, you can run the following command to search for Einstein quotes.
Hope this helps all the elasticsearch enthusiasts.