Elasticsearch for the curious, or what I learned processing Reddit data.

Kyle-James Condon
7 min readAug 16, 2018

As I write this I currently have 300 Python threads requesting Reddit user comments from a (small) Elasticsearch cluster hosted on Google Cloud Platform. Quite how I ended up in this situation is a story for another article, but for now I figured I’d write this piece as there aren’t many resources to help out beginners with Elasticsearch.

What even is Elasticsearch?

“An optical instrument on board a boat on Fairmont Chateau Lake Louise” by Shane Hauser on Unsplash

Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases.
-Elasticsearch BV

I find the above quote a bit cryptic, the gist is that Elasticsearch lets you store lots of data and perform quick searches and aggregations on that data. It’s like a supercharged CSV file that can hold more than 100,000,000 rows, comes with a kickass search engine and can give you the data to make pretty graphs to boot.

Kyle, if Elasticsearch is so awesome why isn’t it everywhere?

Okay, so all this speed and awesomeness comes at a cost. Elasticsearch is a little industrial and by a little industrial I mean a lot industrial. Interacting with the service is pretty difficult, searching through data is hard and the documentation is cryptic at best; get something wrong and…