Three different layers in Distributed Data Computing


In the end, nothing new but this gives names, that’s always useful.

  • The offline (layer)which is addressed by either Hadoop or any other sort of offline Batch Computation system”.
  • The online computation” layer when you want to compute something quickly (<< sec). I think we can imagine solutions like Storm, IMDG, Spark, CEP components
  • Between the two previous layers (can we call it “interactive layer” even if it is not necessarily “interactive” but can last several minutes). once again, Storm, Spark, IMDG & functionService should be helpful.

This is basically an extract fom http://www.infoq.com/interviews/data-science-roles-netflix-xavier-amatriain

source: https://www.flickr.com/photos/pimthida/9438755028/in/photostream/

Show your support

Clapping shows how much you appreciated Olivier Mallassi’s story.