The Evolution of the Databricks Lakehouse Paradigm

Matt Collins
Slalom Data & AI
Published in
5 min readNov 12, 2020
Databricks Lakehouse Paradigm

In January 2020, Databricks made a promising announcement; the introduction of a new paradigm in data management called the Lakehouse — a blend of the best features of data warehouses and data lakes.

Databricks was founded by the creators of Apache Spark and has developed some of the most innovative and revolutionary technologies in data and analytics: the Unified Data Analytics Platform, Delta Lake, and MLflow.

As a thought and innovation leader in the field, Databricks was uniquely positioned to introduce the concept of the Lakehouse, as well as to lead the charge towards it. With the release of SQL Analytics, Databricks is quickly bringing the Lakehouse paradigm from concept to reality.

Introducing SQL Analytics

Today, Databricks has taken the Lakehouse paradigm another leap forward with the introduction of SQL Analytics. SQL Analytics is a new SQL-first experience in Databricks, featuring a new, dedicated workspace. The interface provides the Databricks platform with a first-class environment that SQL users will feel right at home using.

SQL Analytics Workspace

Traditionally, Databricks has been a notebook-only environment, with SQL capabilities contained within notebook cells. That all changes today. Databricks now has the ability to provide SQL-heavy data engineers and data analysts with a distinct workspace tailored specifically to meet their needs and ways of working.

Dedicated SQL Analytics Workspace

Databricks has developed a separate workspace specifically for SQL users. The UI features a rich development environment, allowing users to explore their data and even build dashboards and visualizations right below the query editor. The familiarity of this environment will improve usability for SQL-first users, especially those who may not prefer a notebook environment.

SQL Analytics Query Pane

The SQL Analytics Workspace also features:

  • Simplified cluster management pane: Easily provision a new cluster right from the SQL Analytics workspace or restart a previously stopped cluster. Users have the ability to specify the cluster size and enable auto-stopping after a user-provided timeout period, with support for multi-cluster load balancing and Photon coming in the future
  • Job scheduling: Just like in the notebook environment, Databricks has provided a simple-to-use scheduling feature, enabling users to set up a refresh schedule right from the Query pane
  • Query history: View the history of executed queries and search queries
  • Event alerts: Set alerts and receive messages for specified events

Performance

SQL Analytics leverages the performance of Delta Engine with Photon — a new execution engine built in C++ from the ground up — to provide all of the speed and snappiness you would expect from a SQL service, on the data lake.

When Databricks announced Delta Engine in June 2020, the potential impact of the performance gains on top of Spark 3.0 was immediately apparent, as noted in the announcement blog:

The improved query optimizer extends the functionality already in Spark 3.0 (cost-based optimizer, adaptive query execution, and dynamic runtime filters) with more advanced statistics to deliver up to 18x increased performance in star schema workloads.
Databricks

Now, with the introduction of SQL Analytics, Databricks brings these massive performance improvements to the spotlight for SQL users.

Great BI Experience on the Data Lake

SQL Analytics easily connects to leading BI tools and has been optimized for business intelligence on data lakes. Databricks has developed new ODBC and JDBC drivers for BI tools with lower latency and higher data transfer speed, as well as improved metadata performance for increased read speeds for cold queries on Delta tables. Support for OAuth and Single Sign On provide a seamless authentication experience.

At launch, SQL Analytics will connect to Tableau and PowerBI, with planned integration for Qlik, ThoughtSpot, and Looker.

The introduction of SQL Analytics marks an important milestone for Databricks, in both the evolution of the Lakehouse paradigm and the maturation of the Unified Data Analytics Platform. Databricks is a platform for all personas, and that has never been truer than it is now.

Leveraging an integrated platform for data and AI, organizations are free to focus on creating an environment for innovation.

Modern Culture of Data

At Slalom, one way we help organizations achieve their boldest ambitions is by helping them build a Modern Culture of Data — an environment of experimentation, empowerment, curiosity, critical thinking, and collaboration.

A Modern Culture of Data is enabled by five key elements:

  • Bold Vision: To build a successful culture of data, you need to know where you’re going, how you’ll get there, and why it’s important
  • Access & Transparency: A true data-driven culture is built on a modern technology foundation that provides easy access to data and tools
  • Guardianship: Data guardianship ensures that the use, ownership, and maintenance of data is safe, secure, compliant and ethical
  • Data Literacy: Data literacy ensures that people have a fundamental understanding of data, how to analyze it, and how to use it to make decisions and take action — at all levels and in all processes and parts of their business
  • Ways of Working: An operating model and organizational structure with processes, roles and responsibilities that support the organization’s bold vision will ensure that data and analytics are embedded into day-to-day operations, planning, and decision-making
Slalom’s Modern Culture of Data Framework

According to a Forrester Research report from 2018, insights-driven businesses are growing at an average of more than 30% each year, and by 2021, they are predicted to earn $1.8 trillion annually.

Despite this substantial opportunity, Gartner’s 3rd Annual CDO Survey reported that Chief Data Officers say culture change is the #1 inhibitor to progress.

A Holistic Approach to Data

Realizing the full potential of data and analytics requires shifting the way your organization works and adopting a culture where people have the power to accelerate business outcomes with rapid insights.

With the introduction of SQL Analytics and the realization of the Lakehouse paradigm, Databricks is perfectly suited to help organizations transform by allowing them to focus on innovation by leveraging a unified platform that works for all of its users.

Data engineers, data scientists, ML engineers, and data analysts can actively collaborate in a single platform, shifting time from processing data and managing infrastructure to extracting value.

A Modern Culture of Data , powered by Databricks, gives organizations a platform to act as a catalyst for the massive cultural transformation necessary to achieve the vision of analytics at scale.

Join Databricks, Microsoft, and Slalom on December 15th for a webinar on Building Your AI Skyscraper to learn how to break ground on AI in your organization.

--

--