Databricks VS Snowflake

Miguel Bayan Soares
Mindera
Published in
3 min readOct 10, 2023

With so many technologies surfacing all the time in the data landscape, it’s becoming very hard to understand where we should invest our time. Although I am a huge believer that soft skills and the ability to bridge technology and business outcomes are irreplaceable, we also need to keep up with the emerging technology. The volume and speed at which data is exchanged and generated in today’s world is unparalleled in any previous era. While there are other relevant players that address these specific data needs, today we will address Snowflake (https://www.snowflake.com/en/) and Databricks (https://www.databricks.com/).

What is it?

Databricks is a unified, open analytics platform for building, deploying, sharing, and maintaining enterprise-grade data, analytics, and AI solutions at scale. The Databricks Lakehouse Platform integrates with cloud storage and security in your cloud account, and manages and deploys cloud infrastructure on your behalf. In other words, it’s a one stop shop for all services that you need to extract value from data.

Scalability: Databricks can scale as much as you can invest in the infrastructure.

Storage: Databricks permits storing all types of data in any format and type since its storage layer is independent of the processing layer. Data Bricks can work as the ETL tool to add structure to the unstructured data.

Service Model: PaaS

Cost: Pay as you Go Model

Cloud Platform Support: AWS, GCP, Azure

Vendor Lock-in: No

Adoption : Medium/Hard (Learning curve)

My take:

“The best data warehouse is a lake house” that’s the big selling point of Databricks. It aims to combine Data warehouse and Data Lake under a single Platform (PaaS). The architecture of data lakes separates them from conventional data warehouses because of the decoupling of storage and computing. Databricks has a separate layer for storage and computation. This is a huge feature, not only for scalability but also for traceability of costs (you can have different machines for different use cases / departments).
Another great feature of Data bricks, it supports multiple languages, which makes it really powerful because you can integrate libraries from any programming language into your ecosystem (Perfect for ML and more advanced use cases). It’s a one stop shop for your Orchestration, ETL, Storage, Version Control, ML Models.
It’s a very powerful engineering solution that demands an experienced and technically advanced engineering team.

SnowFlake:

What is it?

Snowflake is a cloud-based data warehouse that seamlessly provides all the data warehouse functions with a single tool without different system integrations. It’s relatively easy to get started, fairly cost effective, and quick to scale compared to a legacy data warehouse. Decoupled storage and computing enable data sharing and scaling, and Snowflake abstracts cloud complexities and lets customers load, integrate, process, analyze, and share their data.

Scalability: up to 128 Nodes

Storage: Semi-structured or Structured data

Service Model: SaaS

Cost: Pay as you Go Model

Cloud Platform Support: AWS, GCP, Azure

Vendor Lock-in: Yes

Adoption: Easy

My take:

It’s one of the best technologies that I have worked with. It is a fully managed service which makes the developer’s life a lot easier. There is no maintenance work and operations like scaling up or down are very simple. Snowflake has a unique architecture that stores data in a Hybrid Columnar Storage, this makes querying super fast. It’s ideal for Business Intelligence use cases (SQL Based). Another two great features are the admin console and the MarketPlace. The admin console offers a user-friendly interface, providing access to all metadata and security information. This makes managing the platform incredibly easy and intuitive. Finally, the MarketPlace simplifies the process of sharing data (data as a service). Users can effortlessly discover and access third-party data and services while also promoting their own data products within the Snowflake Data Cloud.

Conclusion:

Choosing tools and technology depends on a multitude of different factors like budget, use cases, people skills, compliances … The two technologies have things in common, I would say one is best suited for Business Intelligence use cases and has better performance (SQL). On the other hand, Data Bricks offers multiple programming languages which is great for more advanced use cases (AI/ML) and you can have control of all your orchestration and data discovery in one place.

--

--