Google Omega

Just recently, I stumbled upon a new Google paper—Omega: flexible, scalable schedulers for large compute clusters, published these days at the EuroSys 2013.

This contribution is, after GFS/MapReduce, Dremel, Percolator, Pregel and only last year Spanner maybe one of the most exciting and potentially influential papers I’ve come accross.

As I’ve pointed out recently, utilising a cluster, especially considering different workloads is challenging and Google and the likes certainly have a head-start both in terms of experience and knowing the pain points.

In the paper, they introduce Omega, a new cluster scheduler, utilising parallelism,shared state, and lock-free optimistic concurrency control, in order to squeeze the last bit out of the cluster, addressing a variety of workloads from infrastructure services (such as BigTable) to short-term batch jobs.

What I find so interesting is where this leaves the monolithic or two-level architectures such as Hadoop’s YARN or the also promising Apache Mesos.

Given the potential of Hadoop-as-a-Service, and more and more sites running different frameworks, from Hadoop to Storm, ideally on the same cluster, the question arises if we’re sufficiently equipped with what we have at hand.

Maybe we should step back and rethink the strategy, skiping one generation of cluster schedulers and directly go for the next generation?