Header of http://root.cern.ch

Serendipity

Serendipitous discovery in distributed real-time query engine development.CERN team deserves attention and credits.

Michael Hausenblas
Large-scale Data Processing
2 min readMay 22, 2013

--

Due to a sheer coincidence I got in touch with Fons Rademakers, a very interesting chap working over at CERN who runs a team providing the software used for all Large Hadron Collider data analysis.

He discovered that I’m contributing to Apache Drill, a distributed query engine that enables interactive, ad-hoc queries at scale (think: Hadoop dimensions) and shared the following with me:

Also interesting is to see that you work on Apache Drill, which is based on Google Dremel. I am one of the two original authors of the ROOT system, which provides since 1995 a columnar nested data storage system covering the full C++ object model and parallel distributed real-time query engine (in addition to many math, fitting, minimization, scientific plotting, graphics, etc. features). ROOT is used by all High Energy Physics experiments world-wide and especially by all LHC experiments that now store a combined amount of about 200PB in ROOT data format.

I was flabbergasted and went like: OMG, there is a group of people who have been doing this for almost 20 years now. While I think the Google engineers deserve the credits for the engineering innovations they introduced in their 2010 paper on Dremel I also believe Fons and his team deserve at least the same attention and credit.

From the Apache Drill community POV I should say that we will certainly have a deep good look at ROOT which is available under LGPL and try to learn as much as possible from the architecture, deployment and operations experience, despite the language differences (Drill is Java, ROOT is C++). I sincerely hope Fons et al also find some time to drop by on our mailing list and share some of their valuable lessons learned.

Again, Fons & team, I take my hat off and bow to your achievements—KUTGW!

PS: Fons, I hope you don’t mind me sharing your thoughts above, using it as a literal citation, but it’s so great I can’t really keep it for myself ;)

--

--

Michael Hausenblas
Large-scale Data Processing

open-source observability @ AWS | opinions -: own | 塞翁失马