Why I am bullish on the Data Intelligence Platform next year
Over the last couple years since I joined Databricks, I have seen the company go from offering a very nice managed spark solution (orchestrating spark jobs from notebooks was absolutely killer at the time) to a true end-to-end data platform for all data personas. The rate of innovation has been insane, especially given how complex the problem space is, and within just a few (and I really mean less than 2) short years, Databricks can now govern, execute, and monitor ALL data assets from tables, raw data, models, clusters, jobs, warehouses, and more!
Now looking forward to next year, I am even more bullish on where the Databricks platform is going. Here’s why:
Embedded AI and Scale
AI will change every aspect about how we work, especially as it relates to data platforms. The winners of the next race for the best data platform will not be how quickly one can code up API wrappers to Chat GPT in SQL. The winners of the next-generation Data Intelligence Platform will be determined by who best integrates the application of these advanced models directly into the platform, allowing the platform to grow along with user needs automatically, skyrocketing productivity. Databricks knows how to embed AI into the right places to really add value in users’ day to day workflows. AI should be everywhere in your platform, from helping you find the right data, to monitoring changes, solving hard problems, optimizing and maintaining your data, and more. Databricks knows how to embed AI into the right places to really add value, because it has been doing it since that company started. It also is the best platform to build your own AI and manage it right alongside your data.
Already ready for scale with performance and unification — data will only grow as it becomes even more central to our lives, and scaling classical data warehouses cost effectively will not work. Databricks started in the extremely large data space — 10x-ing performance of legacy big-data technology from Hadoop to Spark, and then doing it again against Spark itself — so Databricks is set up perfectly to scale for the future because it has been doing this at-scale for years, whereas classical data warehouses will struggle to justify their exploding costs as they buckle under the massive influx of data and increase in the complexity of use cases. The bar for value-additive data use cases just got higher, and classical data warehousing is on its heels.
Not only is Databricks built to scale from a data volume perspective, it is built to scale the number and diversity of teams that want to develop any data-centric use case. Databricks’ Unity Catalog will allow users to build a truly unified Data Intelligence Platform for any number of teams without needing to separately govern and monitor point solutions and tech stacks across teams. Organizational scale is just as important as data scale, and Unity Catalog is the foundation to scaling data teams across the enterprise. Databricks has focused on unification of any data requirement into a single platform from governance, model development, advanced data engineering, BI, real-time streaming, and more. Other platforms require 9+ tools to create the same “Lakehouse” experience. Databricks eliminates the line between “big data” and “data warehousing” systems. That is true organizational scale.
Engineering Paradigm Shift
Engineering is different now — stitching together tons of “janitor code” (code that doesn’t have business logic, but is just there to connect and maintain integrations) is not a moat anymore.
Spending endless engineering hours stitching together disparate point systems is no longer the engineering moat it used to be, especially now that GenAI is poised to skyrocket developer speed and efficacy by giving users custom code at the snap of their fingers. Engineers need to prepare for this paradigm shift, and Databricks solves for that. Databricks focuses on taking care of the “janitor code” engineering tasks and lets engineers focus on the really innovative part of the solutions through automation of integrations and simple governance. The actual result and output of a tech stack are more important than ever, and the mechanics of a solution is no longer the moat it used to be now that a bot can tackle most coding problems. Databricks focuses on giving engineers more power (which technically is work over time) by allowing them to focus on the value-additive application of the engineering systems by making code easier to write across use cases, seamless to deploy and debug, and simple to find information.
Dedicated to being Open and giving to the Open Source Community
Databricks has dedicated and continues to drive innovation in the open source community with Spark, Delta Lake, ML Flow, and Structured Streaming. These all started as open source projects.
It’s not just what is open sourced directly, it is also how easy it is to use and integrate other open source projects into the platform — which Databricks has been doing for years. You can build anything you want on Databricks, all in a one-stop shop platform.
Yes Databricks is a company that needs to make money to exist and continue to innovate, so it’s a tough line to walk. However, Databricks is one of the very few companies to both contribute immensely to open source while also providing an end to end enterprise-grade platform to the world at this scale. The majority of companies that center around supporting an “enterprise grade” open source solution end up being point solutions and one-trick-ponies, and thus stitching these point solutions together becomes insanely expensive.
Most enterprises dont want to be in the business of managing and reacting to the unstable and unaccountable edges of open source tech, they want it managed for them, and they expect a certain standard of innovation and support that is just not possible in open source, and Databricks straddles that line beautifully.
As all data warehouses continue to adopt support for Lakehouse architectures, this strategy is now becoming the de facto way to manage data. But then, customers will begin to wonder why they would pay 2, 3, or 9 times more for a “Lakehouse” when they could just migrate to Databricks for the real thing. They will also naturally wonder why they need to support 2 table formats at all (open source lake-based format + proprietary db format), and realize that you do not need to because that only adds extra cost and maintenance overhead. The most common disjointed “lakehouse” architecture I see as a Solutions Architect is the EMR + Snowflake tech stack, which is a lose-lose combination. You have all the complexity and maintenance overhead of EMR (which is never just EMR by itself), with all the eye-watering costs of Snowflake, and your paying engineering teams on both ends! This is what “saving money” looks like for these half-baked Lakehouse concepts.
AI-Driven Observability
Enterprise AI-driven observability is vital in this next wave — With the world’s first real data intelligence platform, observability will be vital, and AI is the best solution to bringing the right monitoring and operational insights to the right users at the right time. Databricks’ Unity Catalog has set the platform up for a huge amount of innovation in this space next year. The level of breadth and depth of metrics to monitor in a truly end-to-end data platform is too massive to do manually — and AI will be the catalyst for making Databricks an unstoppable platform for observability.
Moving forward — embedded AI for developer productivity and AI-driven observability and management will be the things to watch next year in the data warehousing space, and in data management in general. I am excited to see what the Data Intelligence Platform brings, and I am excited to show users how to maximize the benefits of what we build!