SAP BW vs Snowflake
Snowflake made it possible for my company to move to the cloud in less than a year. We’ve never looked back.
Can a company, having already invested in and matured an on-premise data warehouse, abandon it entirely in favor of a cloud-based solution, reengineered from scratch, leveraging none of their existing software stack, and manage to do so within a year? Well, yes! Here’s what we learned along the way.
I currently work for the world’s leading B2B bedbank. We provide wholesale accommodation, transfers, activities, and car rental products through a global network of travel distributors. With thousands of daily transactions, millions of searches and comparisons, teams across our business rely on a robust and multi-purpose Data Warehouse (DW) to achieve their functional needs. As the Business Intelligence architect, I oversaw our company’s transition from an on-premise solution running on SAP BW on HANA (BW) to a cloud-based offering through Snowflake (SFLK).
In this article, I will recount our journey to cloud-based BI from an enterprise perspective, where success comes down to objective measures like cost, performance, and meeting business needs.
Using practical examples I will illustrate where each solution excels and falls short in regard to these measures while trying to avoid subjective topics like UI and usability.
For context, it’s worth noting that our transition to SFLK occurred entirely organically and was in no way mandated by any top-down pressure. My company had been running SAP BW since 2014 and at the time of the transition, the data warehouse had reached a respectable level of maturity and stability. However, in 2018, we undertook a series of corporate mergers, buying two of our major competitors along with their intellectual property and IT systems. With this came our first exposure to SFLK.
At first, we considered SFLK just another in a list of legacy systems requiring migration and subsequent decommissioning. However, we quickly realized its utility as a staging layer with its open API and scalable pricing. After that, it wasn’t long before we understood and began to rely on the agility that cloud-based architecture brought to our design process. Within a year of laying eyes on it, SFLK became the single source of truth for company-wide reporting and the natural choice as the enterprise-level warehouse for handling the volume and complexity of our growing company.
So with that, let’s begin to break down the key factors which influenced how and why this integration unfolded the way it did.
Let’s start with the bottom line. As with most cloud-based databases, SFLK requires no initial investment in infrastructure. You pay for what you compute using credits, billed at roughly $3.00 per credit (depending on AWS region), per hour. Warehouses are rated by processing power starting with extra-small at 1 credit, doubling in power and credit consumption until 4XL. Storage is trivially cheap and is billed separately (below we’ll see why this is a game-changer).
The biggest advantage here is the ability to spin up warehouses on demand and size them according to usage needs. As workloads vary throughout the day, it makes sense to pay for a small warehouse with high user concurrency for reporting during business hours and a more powerful one during overnight processing, then have both auto-suspend afterward. SFLK has a very good explanation on their website, with sample pricing scenarios, so I won’t go into further detail. Their pricing guide is straightforward and our attempts at estimating yearly costs were very much on point. For those who are curious, SFLK encourages companies to try their product for free, through a generous credit allotment. With this, the barrier to entry (historically a major hurdle for a company) no longer exists.
VIRTUAL WAREHOUSE SIZE
Explaining SAP pricing could easily be an article on its own because the company is notoriously secretive about the true costs of their services. (Here is a semi-official estimate for hardware pricing from 2014 explaining that the site with official pricing is down for maintenance and will be up shortly [it’s not].) What’s clear is that an up-front investment in hardware would be required (licensing is separate and we’ll get to that in a second). This forces an organization to contemplate both current sizing and future growth and to pay up-front for storage and compute which may not be needed for many years. Once bought, there is a hard limit on the scale of the DW and any future projects which might spin off their own data-marts with their own sizing needs. As this limit approaches, any future projects requiring more resources would incur an inordinate additional investment, making them all but unfeasible in the eyes of any IT project review board.
Does your organization have a good legal team? If so, let’s talk about SAP licensing. BW on HANA or its latest iteration, S/4HANA, is licensed by peak memory consumption. No matter how many servers you invest in, your queries have to peak out (or fail) at the memory limit negotiated at the time of purchase (or face steep fines). Do you want to export your data to a third party system or non-SAP reporting solution (e.g. Tableau or Power BI)? You’ll need to pay for those licenses plus the infamous SAP Open HUB license which allows you to export your own data (seriously). If you decide to bite the bullet and settle for the SAP software suite you’ll still need to pay for yearly user licenses. These depend on the type of user and what user-category they fall into (separate article yet again). If the total starts to seem a bit overwhelming, don’t worry, there is a mystery SAP-sales-rep-good-guy discount which, depending on how their yearly quotas are looking, will drive the final price down (usually) quite significantly. OK, that’s your upfront purchase price. Now just factor in yearly maintenance from now until eternity at 20% of the purchase price and you’re done. Considering all this, most SAP BW implementations end up costing several million dollars up front, plus the aforementioned recurring costs.
If we’re going to talk about the cost of a corporate DW, we need to consider the human capital required to build and maintain it. SAP BW is marketed as a complete end-to-end BI solution, from ETL to modeling to reporting. This is largely true because auxiliary tools such as SAP Data Services (ETL) and Analysis for Office or BI Portal/HANA cockpit were most likely bundled with the original license. It’s also true that a SAP BW developer is a one-man army trained to pilot all of these tools despite all their unique challenges. However, the downside is that a SAP Developer commands a much higher salary than a traditional SQL developer as it is a niche skill with a very steep learning curve (which will also hinder up-skilling existing team members without prior SAP experience). Next, we have to factor in the DBA team (“BASIS” in SAP jargon) which is responsible for tuning the hardware and applying patches and upgrades and who, for reasons previously mentioned, also command a salary premium over traditional DBAs. If you choose to expand your SAP ecosystem with specialized solutions for reporting (SAP Analytics Cloud), machine learning, IoT, etc., SAP will be happy to sell you the required licenses and products, but the same indirect costs will need to be considered.
Snowflake operates on ANSI standard SQL and requires absolutely no maintenance, tuning, or patching. This means that the core development team will consist of database developers — a tech profile which is abundant and reasonably priced in today’s job market. Since no ETL capacity is provided through SFLK, you’ll need to either pay for a 3rd party solution (e.g. Informatica, Alteryx) or hire a few full-stack developers to implement an open-source alternative via Airflow or similar. The same is true for reporting: you’ll need to pay for a third-party tool like Tableau or Power BI. Snowflake operates on the principle of focusing on their strength: providing the best possible cloud-based DW and making sure it integrates with industry-leading third-party tools to meet specific customer needs. In contrast, SAP tries to sell a one-size-fits-all integrated solution to which customers are contractually bound.
Physical on-premise hardware versus unlimited virtual on-demand computing power? That’s not really a fair fight. Instead, let me focus on why SFLK excels even among cloud-based offerings in this regard.
Unlike its rival, Amazon Redshift, where powering up and managing compute clusters is slow and laborious, SFLK warehouses are available within seconds of being triggered by your queries and power down on their own. Sure, other cloud DBs (e.g. Google Big-Query) excel in this regard but SFLK boasts one additional feature which none of its rivals have thus far achieved: zero-copy cloning.
As mentioned earlier, SFLK separates storage and compute, but this isn’t limited simply to being able to bill separately for each resource. The genius of SFLK lies not in separate warehouses but in their ability to share the underlying databases and their data. In SFLK, data is stored centrally in Amazon S3. When a warehouse is spun up, compute resources are used to query and manipulate this data. However, as the storage is completely detached from the virtual compute resource, this allows multiple warehouses to simultaneously reference the same tables. Unlike traditional DBs where the hard disks come bundled with the processor, SFLK allows a company to allocate processing directly where and when it’s needed and not have to worry about keeping data in sync. All data is instantly available to any user, in any database, in any warehouse in the company.
Zero-copy cloning takes advantage of this centralized storage approach and allows tables (or entire databases) to be cloned instantly by copying only the metadata associated with the underlying table. This means your developers no longer have to depend on a months-old system copy to test productive changes. They can reference productive tables right from the development environment and clone and manipulate them instantly to unit test safely and thoroughly.
Again, not really a fair comparison but I can’t keep covering for you, BW. As stated on their website: “Snowflake automatically handles infrastructure, optimization, availability, data protection, and more, so you can focus on using your data, not managing it”. Unlike with BW, there are no patches to apply, no hardware to fix, and no hitting the reset button on a production server when all else fails.
In practical terms, this means that your developers become your central resource. The BASIS/DBA role becomes obsolete and IT costs are also reduced because of SFLK’s lean footprint on the network. Instead of multiple servers communicating through multiple ports and protocols like with BW, SFLK simply needs an internet connection.
Since storage and processing have always been bundled as part of the same hardware package, it made sense to keep reporting separate. Combine that with the complexity of DW solutions like BW and even SAP’s front end tools like Business Object and Analytics Cloud, and the BI team quickly becomes the bottleneck for users being able to access the data they need. In the best-case scenario, an endless backlog of feature requests and projects falls on the BI team, or worse, “shadow BI” starts creeping in from areas in the business that are too impatient to wait.
SFLK has given us leverage to get around this problem by allowing us to provide cost-effective compute clusters to our business areas which point at a single unique data set managed in the data warehouse. Instead of having to build a custom report for every trivial business demand, users in SFLK can access any of our central fact tables or navigate master data and do so using the tool of their choice. Business users can get at the data through a front-end like Tableau and analysts can query the data directly through standard SQL.
This liberates the BI team and allows them to focus on their central task: building the DW and leading the more complicated projects. Having access to the information, the processing power to run analytics, and the tools to allow them to do so, business teams are empowered to get the most out of the data we provide. Some business areas have chosen to hire analysts with database and data science backgrounds and to start building analytics specific to their needs, while others are perfectly happy to use the existing dashboards available to the entire company. Whether your warehouse relies on a Kimball central architecture or Inmon data-mart approach, SFLK’s unique handling of storage means that sharing data while maintaining a single source of truth has never been easier.
Governance and Operation
Despite being different entities, it’s no secret that in a DW good governance makes for good operation. SAP has made this a particularly strong focus in their BW product. Data integrity checks are very tightly coupled to the BW loading process, and database constraints make it so that master data is consistent and reliable. Naming conventions are also enforced and BW makes efficient use of metadata and standardized object types to make it easy to trace the flow of data through the warehouse. A benefit of these standards is that any (BW) developer can step in and understand the system landscape within minutes of being given access. When you pay the premium for BW, this is effectively what you’re paying for.
In SFLK, governance and operation will depend entirely on you and your team. There is great freedom this, but also great responsibility (for great danger lies where responsibility has been abdicated). If this is not taken seriously from the first moment, complexity will escalate and maintenance will quickly get out of hand. Unlike SAP, which provides a bundle of standard tools for practically everything (for better or worse), your SFLK WH will need its own ETL and reporting tools. This means that understanding end-to-end data flows will become impossible without a proper diagram. Documentation becomes essential to understanding and communicating how the DW is built. Even primary key constraints are not enforced in SFLK (the gift and curse of cloud architecture), so even metadata-based auto-generated documentation will be impossible to obtain unless adequate attention has been paid to governance and best-practices from the beginning.
As we have seen, the two WH solutions deliver entirely different products organized around entirely different architectures. Comparing the two objectively is always going to be a challenge as the needs of each organization will vary. Here, I’ve tried to use a generic example in order to showcase the strengths and weaknesses of each solution.
Putting any biases aside, we have to be conscious of the fact that we are comparing a traditional BI solution to its modern successor, specifically engineered to improve upon the shortcomings of traditional options. With this in mind, he are some closing thoughts.
SAP BW was always intended to be more than just a database. It aims to be a holistic solution to enterprise BI and commands a market premium on both licensing as well as on experts qualified to work with it. However, this solution will not fit all molds and comes with myriad limitations even when it does suit the organization. Unless your company runs an SAP ERP and relies on pre-built standard models to report on it, then it might be time to consider other options.
For us, it made sense to make the jump to SFLK. In our experience, the flexibility that the SFLK platform brought to our design process and the agility with which it scales made loading and integrating our data possible in ways that were previously unthinkable with BW. Naturally, there were risks and unknowns involved with the migration to a new platform; however, having seen SFLK’s potential we were confident that they would inevitably be overcome. The journey of changing WH providers and transitioning to non-SAP tools has sometimes been challenging, other times rewarding, but it never set us back and it paid off in the end. If your organization is considering making the switch, I think you’ll find it worth your while. After all, the barrier to entry no longer exists.
A year after the platform migration from SAP BW to Snowflake, do the business users share BI’s excitement? What we learned and what we could have done better, in my follow-up article.