SQL-on-Everything Adoption

Sandeep Uttamchandani
Wrong AI
Published in
2 min readFeb 19, 2018

--

The early days of Big Data popularity were driven by programmers using MapReduce SDKs for data analytics. As NoSQL stores starting emerging, they too were built with specialized query languages such as Cassandra CQL, MongoDB JSON-like Query, HBase API, etc.

Similar to most new technologies, the early days of Big Data adoption were driven by innovators and early adopters that had to learn the technology-specific programming models for addressing their Web 2.0 data management use-cases (that could not otherwise be solved).

Any new technology has an adoption curve — Geffrey Moore in his book “Crossing the Chasm,” described the adoption process as having a significant gap or “chasm” in going from the small population of early adopters to mainstream adoption.

Given the exponential growth of data over the last few years, Big Data today is a problem that impacts every industry vertical. The ability to extract insights (retrospective, streaming, predictive) is becoming a business differentiator disrupting traditional incumbents. While there is a clear business motivation to adopt Big Data technologies, the biggest hurdle was that traditional Business Analysts and Data Scientists within these enterprises were not programmers. Over the years they had become experts in SQL, but could not apply those skills to Big Data Products that implemented their own APIs and SDKs.

Over the last few years, Big Data Solutions have decoupled the Storage from the Query Engines. Most of these Query Engines support SQL making them instantly popular within the data community. The ecosystem is flooded with options that support SQL on Hadoop as well as other NoSQL and Object storage frameworks. Following is the list of some of the most popular open-source SQL Query engines for Hadoop & beyond:

  • Hive (supported by every Hadoop; support from HortonWorks)
  • Presto (started by Facebook)
  • Impala (support from Cloudera)
  • Spark SQL (support from DataBricks)
  • Drill (support from MapR)
  • Flink (support from Data Artisians)
  • Kylin (started by eBay)
  • HAWQ (started by Pivotal)
  • Phoenix (started by Salesforce)
  • Tajo (started by Gruter)
  • Trafodion (started by HP)
  • Apache Beam Framework (Google project using Apache Calcite for SQL support)

In summary, SQL-on-Everything continues to unlocked mainstream Big Data adoption. While specialized APIs and SDKs are popular among Data Engineers, the vast majority are successfully able to apply the SQL on any of the Big Data nails!

--

--

Sandeep Uttamchandani
Wrong AI

Sharing 20+ years of real-world exec experience leading Data, Analytics, AI & SW Products. O’Reilly book author. Founder AIForEveryone.org. #Mentor #Advise