Three different layers in Distributed Data Computing
In the end, nothing new but this gives names, that’s always useful.
- “The offline (layer)which is addressed by either Hadoop or any other sort of offline Batch Computation system”.
- “The online computation” layer when you want to compute something quickly (<< sec). I think we can imagine solutions like Storm, IMDG, Spark, CEP components
- Between the two previous layers (can we call it “interactive layer” even if it is not necessarily “interactive” but can last several minutes). once again, Storm, Spark, IMDG & functionService should be helpful.
This is basically an extract fom http://www.infoq.com/interviews/data-science-roles-netflix-xavier-amatriain