A reading of Elasticsearch source code
What is Elasticsearch?
I would say it is a database which is easy to scale with many “search” functions like “auto complete”,”suggestion” or ”image search” etc…and since it is written in Java,you can customize query or change something internal quite easily often as with plugins,if you want.
Source code of Elasticsearch
You need to understand “Dependency Injection” style of source code to understand how Elasticsearch decouple the dependencies, and “Event driven threading” to understand how it interacts with other nodes.I think Elasticsearch is actually a sort of map/reduce framework especially for “lucene”.
It uses “guice” of google java library quite heavily, which leads me to think why it is not built on Scala.
How it start
Everything began from org.elasticsearch.bootstrap.Bootstrap class.
What it does is simply initialize modules (or plugins) and start the nodes.
You might want to check out org.elasticsearch.node.intrnal.Internal class to see the proof of “Dependency Injection” style and also how you can actually optimize Elasicsearch for you, for not adding as the modules to be injected.
So let see what “_count” does
/_count is a endpoint you can count the number of docs registered in given index or type or whole Elasticsearch like this
curl -X localhost:9200/_count
So let dig into what happens with this request
Simplify — to five things
You can simplify it to five things.
1, Receive request
2, Dispatch request
3, Analyze request
4, Execute related modules in Elasticsearch internally with Nodes
1 Receive request
2 Dispatch request
Then dispatchRequest of RestController class will dispatch the given request to the module which shall handle it.
The “_count” request is GET method so it will retrieve the suitable method from the handlers registered in GET trie. Remember that the handlers are registered when you initialize the Bootstrap class,provided you added RestModule.The initializion will eventually load modules and you can see that the endpoint or handler for the “_count” is registered here.
By the way, you can see a endpoint for the OPTION method in RestController, which I’m not sure the usage of or not well written in the documents.
3 Analyze request
Well,the RestController will run handleRequest() of given handler, this time the RestCountAction.
You can see that this method analyze or parse the request, but for the “_count” there seems to be nothing special.
Did you know that you can actually start from right here, if you run from any jvm languages. Example from Scala.
4 Execute related modules in Elasticsearch internally with Nodes
This part is the most tricky part.You might want to see the “5 Response” part first.
You can see that the last part of RestCountAction call client.count() with “request” and “listener” as params. This will eventually led to execute() of NodeClient, which will call execute() of given “action module” this time
You might have noticed that there is nothing like “interacting with nodes” or “lucene” yet.Soon, you will see both.
execute() of TransportCountAction is actually defined in abstract class and will invoke doExecute() of org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction.See it still hold the “request” and “listener” params. Well,the params will finally create object AsyncBroadcastAction and start() has been called.You might find that “nodes” variable is also there.
start() will eventually call every related shards from the requests in performOperation().However, inside the performOperation() it will call onOperation() and eventually sharedOperation(). You can finally( and only) see “lucene” here for “_count”.
Every results of sharedOperation() on each shard or nodes are collected and registered in sharedResponses array, which will be reduced in making of response.
Given how limited the “lucene” was actually used and how it effectively interact with other nodes, I can’t help thinking Elasticsearch as a sort of map/reduce framework.I’m thinking to use Hazelcast for some map/reduce for doing things with the data,but I might well as do it on Elasticsearch and perhaps that is what “aggregation module” is doing.
The “listener”,which is sort of event driven thread,will finally return something when there are some result with some customization respect to the module. However, if you are using java client,you will get the response directly from the node.
I hope this helps you.