Building a modern and open data platform with Databricks on Google Cloud

Robbie Clews
Aug 23 · 5 min read

The comparison of data to oil is often made to illustrate that data is the most important commodity that will drive growth and spur innovation over the coming years.

While much of the obsession related to data has been restricted to historically data-driven industries like tech and finance, there has been a notable shift across all industries in viewing data as a core competitive business differentiator and not ‘just another capability’.

At Slalom, we have partnered with clients across multiple sectors to accelerate their journeys towards integrating data into their organizational culture. We’ve found that those clients who invest in digital tools, advanced analytics solutions, and data literacy will not only navigate our current reality more successfully, they will empower flexible, resilient cultures moving forward.

For this reason, we put the same enthusiasm and dedication we have for our clients into building and maintaining relationships with partners such as Databricks, which is now generally available on Google Cloud Platform (GCP). This partnership enables our customers to innovate faster and with more flexibility.

In this blog, we’ll explore the reasons why Databricks on Google Cloud is a perfect combination for our customers. Together with our own Data Platform Accelerator, we can enable them to innovate faster, with flexibility and ease.

Avoid Vendor-Lock In — Google’s Commitment to Open Source

Even in our day-to-day lives, the impact that Google has had on the way we access and consume data is undeniable. With the products and services available from Google Cloud, organizations can leverage industry-leading innovations and expertise to accelerate their data journeys and make smarter business decisions without compromising on trust and security.

Google Cloud provides a wide host of services, from foundational ones such as Google Kubernetes Engine, Google Cloud Storage, Google BigQuery, and Looker — for effectively limitless storage, data warehousing, and visualization — to the suite of tools within the Google AI Platform that provides end-to-end support for the AI/ML (artificial intelligence/machine learning) life cycle.

Beyond the services available on Google Cloud, Google has a long and storied history of creating and maintaining open-source projects such as Android, Kubernetes, TensorFlow, and Angular, that have changed the landscape of their respective fields.

Often we hear from our customers concerns around vendor-lock in and their inability to quickly and easily pivot to adopting new vendors or tooling. Google’s partnership with Databricks further reaffirms their commitment to open-source technology — providing customers with flexibility and choice — and ultimately enabling agility to navigate the vendor landscape of the future — whatever that future may look like.

An Integrated Platform from Databricks

In a similar vein to Google Cloud, Databricks’ foundation is steeped in seminal contributions to the open-source ecosystem. Apache Spark™ — now an almost ubiquitous presence in the toolset of any organization working with large volumes of data — was created by the founders of Databricks and continues, 6 years and counting since its initial release, to be actively maintained by the company. Since then, Databricks has expanded its offerings to support end-to-end data processing and analytics in a managed, optimized and integrated platform.

Among the more recent offerings are Delta Lake, that combines the best features of data lakes and data warehouses into a unified data lakehouse vision, MLflow, that enables simple management of the machine learning life cycle, and SQL Analytics, that supports business intelligence (BI) initiatives and powers BI tools such as Looker.

With Delta Sharing, customers can also share their data regardless of which computing platforms they leverage.

Together, Databricks on Google Cloud Provides More Choice on an Open Platform

Organizations can now seamlessly use Google Cloud’s global, scalable, and elastic platform with Databricks to create a lakehouse and power data engineering, data science, machine learning and business intelligence workflows on a single platform. On Google Cloud, Databricks is deployed on Google Kubernetes Engine (GKE), marking the first time that Databricks can be deployed on a fully containerized cloud environment that is now considered the best practice for the deployment, scaling and management of enterprise applications.

From a technical end-user perspective, Databricks is tightly integrated with Google Cloud’s analytics ecosystem services such as Google BigQuery, Google Cloud Storage, Looker Google AI Platform, and Pub/Sub. Data scientists can leverage services from the Google AI Platform for model training and serving, while those on BI teams looking to create dashboards and reports can natively connect Looker to Databricks. For infrastructure and IT teams, identity management and billing for Databricks services are managed through existing services which means that adding Databricks to an existing infrastructure minimizes administrative overhead.

Figure 1: Seamless integration between the Google Cloud Platform and Databricks from both a management and infrastructure, as well as the end-user perspective.

Simple, Open, Collaborative

For organizations motivated to take the next step in accelerating business transformation with data, this partnership brings together two market-leading data, AI, and analytics platforms.

Technical teams have, at the click of a few buttons, the ability to:

· Choose from a suite of tools that make sense for them so they can rapidly and securely fulfill their project needs;

· Ensure data quality and data integrity;

· Democratize the use of data, making it more easily accessible and usable while also reducing the overall cost of storing data;

· Gain support for diverse data formats (unstructured, semi-structured and structured);

· Support their business intelligence needs “from BI to AI” and leverage GCP’s advanced capabilities with Machine Learning

Business teams can also rest easy in the knowledge that any capital investments are future-proofed and can scale with their needs, while corporate concerns such as data security, privacy and compliance are managed through a centralized command center.

How Slalom Can Help

At Slalom we have a wealth of experience delivering transformational data and analytics solutions to clients in partnership with both Google Cloud and Databricks. As a Partner of the Year with both Google Cloud and Databricks, our delivery teams are well qualified to help your organization embed a modern and open approach to data.

We’ve helped state agencies deploy healthcare information exchanges, life science companies with advanced analytics for drug development, and a media conglomerate break down data silos and focus on improving their viewer experience. Our technical expertise, coupled with a deep understanding of people, process and the importance of organizational culture, makes Slalom an ideal partner for your data journey.

We have compiled the best practices from our experience into a Data Platform Accelerator which increases the time to value for our clients. This Solution enables a Data Lakehouse deployment with Databricks that not only reduces the time to value from a few months to just three weeks, but also breaks down silos between teams and ensures a truly singular source of truth for data.

To find out more about how we can enable your organization to accelerate your adoption of Databricks with our Data Platform Accelerator, get in touch.

Authors: Krishanu Nandy, Matt Collins, and Robbie Clews.

Slalom Technology

For perspective and analysis on everything IT: cloud…