Divide and Conquer — SAP to Snowflake Migration Guide

Lift and shift, divide and conquer. An object by object migration guide from SAP to Snowflake

Data warehouse migrations are colossal undertakings, not to be taken lightly. But, if you are reading this article, it means that your organization has weighed its options and decided to abandon legacy on-prem architecture in favor of a future-proof, scalable cloud solution. I commend you.

However, years of relying on SAP architecture and the chaos of proprietary object types can dishearten even the most ambitious leaders, and make the noble task of cloud migration seem more daunting than it is.

This article will cover the translation of key components from SAP BW and HANA to equivalent Snowflake objects. We will also dive into specific tools and strategies that can help accelerate the migration while ensuring end-user buy-in and acceptance.

The users must stand with us, or our push will falter

Divide and Conquer

Breaking a large problem into manageable chunks is an age-old battle (and project) strategy. The phrase can be traced as far back as Philip II of Macedonia (divide et impera) and probably further.

Diving a migration into manageable pieces allows for better estimates, task dependencies, and parallel workstreams. But, what constitutes a “manageable” piece? Before we can determine units of work, we need to understand the building blocks.

If I am able to determine the enemy’s dispositions while at the same time I conceal my own, then I can concentrate and he must divide.

-Sun Tzu

Divide

SAP designs are notoriously complex. Similar objects can be set up in drastically different configurations, and different object types have overlapping similarities. Plus, there are more places to hide code than a coin in cargo pants.

Fortunately, an equivalent Snowflake feature exists for every SAP object. The degree of similarity will vary across features, but the migration can move forward without worrying about incompatibility roadblocks.

Here is an overview of SAP to Snowflake object equivalence. We will look at each of these in greater detail below.

InfoObjects

Unless an InfoObject contains master data, it’s simply a column and does not need to be maintained independently in Snowflake. InfoObjects with master data are tables. Master data attributes are, likewise, columns — no need to maintain or declare them independently either.

DSO / ODS

Ralph Kimball gave us the concept of fact and dimension tables. SAP took it a step further and made them into system objects. That an InfoObject commonly holds dimension data and DSOs are used as fact tables does not change that they are both database tables.

To keep facts and dims separate, use naming conventions in your table names.

HANA Views and Composite Providers

Calculation views, attribute views, analytic views — all views. Composite providers? You guessed it, views (using UNION.) Writing views by hand instead of fiddling with menu options across four different object types is infinitely faster and affords more control over the output.

This can not be understated.

Switching to SQL be like

Routines and ABAP programs

They live among us, hiding where you least expect — finding them will be hell. Converting them to Snowflake objects doesn’t have to be.

Routines are typically used to perform minor adjustments to column values such as CASE or CONCAT. You can handle such operations directly in the INSERT or SELECT statement by rolling your own SQL. If you want reusable code snippets, use Snowflake UDFs (user-defined functions.) UDFs can be written in SQL (thank you!), JavaScript, and Java.

Instead of ABAP programs, which allow for procedural logic and database manipulation, Snowflake uses stored procedures. Procedures can be defined in JavaScript, SQL (Snowflake Scripting), Scala (Snowpark), and Java (Snowpark).

Unlike routines and programs, which can be scattered throughout the data pipeline — within objects and behind menus — Snowflake functions and procedures are securable database objects like tables and views. They are organized accordingly in a schema.

Here is a brief comparison between UDFs and SPs. Detailed information is available in the Snowflake documentation.

Recordmode and CDC

Recordmode values for changed records, combined with multiple tables associated with individual objects, are SAP’s answer to change data capture (CDC). Snowflake has Streams.

Streams are created on top of database objects and track changes to the underlying data. They provide a complete reference for what records were inserted, updated, or deleted.

Streams can be queried like tables and used to load or update tables downstream. Multiple Streams can exist on a given object to feed different targets, and they can even be created on top of views!

Further documentation in Snowflake.

Process Chains

Process chains are the DAGs of an SAP data warehouse, controlling the data flow with schedules and dependencies. The Snowflake equivalent is Tasks. Tasks can be programmed to run on a schedule or as children of parent tasks to execute a SQL command or call a Stored Procedure.

Tasks can be chained together, forming dependencies and parallel executions, similar to a process chain. Combined with streams and stored procedures, tasks are a powerful tool for running continuous ELT workflows without investing in a 3rd party orchestration tool.

InfoCubes

Converting an InfoCube to a Snowflake object will be the least straightforward out of all the examples given. An InfoCube is a strictly-defined multidimensional object with many configurable parameters and no direct equivalent in Snowflake.

Don’t let that deter you.

Under the hood, an InfoCube can be summarized as an extended star schema — a type of snowflake schema, ironically. A cube has a central fact table linked to (between 4 and 16) dimensions and “navigational attributes,” which are just dimensional attributes from said dimensions (and their extensions.) On top of that, there can be aggregates.

Since no such thing exists in Snowflake, you will have to work with existing objects to achieve the star architecture. To achieve the desired performance and data freshness, you will need to experiment with a mix of tables, views, and denormalized data materializations.

For aggregates, Snowflake (Enterprise Edition) does provide a similar feature: Materialized Views. Materialized Views are a pre-computed data set derived from a query specification — like you would get with a CTAS. But unlike a CTAS, Snowflake monitors the base table for changes and automatically (and serverlessly) updates the materialized result.

Snowflake’s services layer will even re-direct queries on the base table to a Materialized View.

PSA tables

The Persistent Staging Area (PSA) is the inbound storage area for data from source systems in SAP BW. In Snowflake, this is called a Stage and is created on top of existing cloud storage providers (currently, AWS S3, Azure Containers, and GCP buckets are supported.) Internal “named” and “table” stages are also supported and are maintained by Snowflake.

Conquer

Now that we’ve seen what we are up against let’s form a plan of attack. Having analyzed the components piece by piece, we can plan a strategy that considers the indispensable features of the current solution while focusing on the strengths of the target platform.

Previous migration articles covered concepts such as agile methodology, MVP deliverables, and the importance of accessible design documents (links below). Now, let’s explore finding a “champion” for our data crusade.

Crowning a champion

The data warehouse team will spearhead migration. But sending them in without reinforcements can be risky. This is where the “champion” comes into play.

A champion is typically the head of a high-value department with a vested interest in data-driven decision-making — a power-user. Engaging them will yield a critical ally in helping build and promote the new solution, as your success becomes their success.

image from gameofquotes.blogspot.com

Understanding the data needs of your champion and keeping them engaged throughout — but especially in the initial stages of — development will provide meaningful guidance on the functionality that end-users want.

Avoid getting overwhelmed by the diverse needs of an entire organization by focusing on a single, emblematic use case and tailoring the platform's features and capabilities to exact specifications.

This approach guarantees you’ll have a devoted fan and high-profile advocate as soon as the new platform launches. The champion can vouch for the new solution in front of the executive board and set a blueprint for the rest of the organization to leverage the data platform for maximal effect.

The features built in collaboration with your champion already satisfy a real-world use case and will likely scale to meet the needs of the rest of the organization.

But do not do as Oberyn Martell and celebrate before the battle is over. You can find new champions and repeat the exercise to evolve the solution into a mature data platform past the initial MVP.

Boot camp

Moving to a new and fundamentally different data platform means needing to educate and upskill your developers. Snowflake is famously easy to adopt thanks to its completeness of features, ANSI compatibility, and “near-zero maintenance.”

This guide outlines the Snowflake alternatives for the most-common SAP objects but does not include Snowflake-exclusive objects such as pipes, tags, and masking policies. Keeping such features in mind during the planning phase of the migration will improve design efficiency and save costly rework down the line. An investment in education and upskilling is sure to pay dividends.

Fortunately, Snowflake offers a variety of training programs to help your organization meet its goals. These include free self-paced courses and instructor-led training geared to the unique needs of each organization.

For those completely new to Snowflake, I highly recommend the 4-day Fundamentals course, which touches on all the essential aspects of the platform.

Review the complete Snowflake course catalog here.

Arm yourself

Leading a database migration with in-house resources does not mean doing so empty-handed. Whether specific to SAP migrations or generally applicable to any enterprise-scale project, the following resources exist to help automate auxiliary tasks and support your operation:

  • Tools to document the divide and conquer exercise and create functional blueprints for the development phase.
  • Landing source data from SAP ECC and BW systems to the cloud.
  • Automating mass conversion of HANA objects.

Documentation, planning, and modeling

As discussed, an SAP to Snowflake migration will involve breaking down an existing model into its base components, translating those components into equivalent Snowflake features, and piecing them together into organized, time-bound deliverables. This requires good documentation.

With the right tool, documentation can also serve as the blueprint for subsequent development — becoming an evolving platform for collaboration instead of a stale, throw-away artifact.

When it comes to Snowflake-specific modeling tools that also allow real-time collaboration and documentation, SqlDBM is the only modern, cloud-based game in town. SqlDBM allows the planning and conceptual designs to evolve with the project and generate fully qualified, deployable Snowflake DDL as the project moves to the physical modeling stage.

Furthermore, the diagrams and documentation will continue to stay up-to-date, synchronized, and searchable once the platform goes live. There are even extended data Governance features like object flagging and glossaries.

Review all the features that SqlDBM provides for your database, before, during, and post-migration, on their website.

Data Integration

Organizations running SAP BW or HANA warehouses are likely to also have SAP ECC data sources — notoriously difficult to extract from. HVR (now part of Fivetran) is one of the few data integration tools on the market specializing in extracting data from SAP ECC and HANA; it even supports Log-based CDC deltas.

HVR captures data from SAP cluster or pool tables and transparently loads it into regular rows and columns in Snowflake. The tool will even validate and compare the extracted records. For more information on features and the complete list of supported connectors, check the HVR website.

Automated HANA Migration

BladeBridge Converter is a code migration tool designed to batch refactor code from/to various data platforms, including SAP HANA. Organizations with an extensive HANA footprint may benefit from the automated template-based conversion that BladeBridge provides.

Although SAP BW is not yet officially supported as a source, select BW components may also be configured through BladeBridge templates for Snowflake refactoring. Find out more on the BladeBridge website.

Sortie

As we conclude this briefing, the operating orders for conducting a full-scale, lift-and-shift SAP to Snowflake migration should now be clear. By taking a seemingly monumental effort and dividing it into manageable chunks, your developers can conquer this behemoth task by picking it apart with concentrated attacks.

The in-body links in this article provide ample auxiliary support for training and equipping your developers. The ones below offer extended reading on tactics and lessons learned in prior campaigns.

Remember your training, and you’ll do fine.

Now get after it!

--

--

Serge Gershkovich
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

I am the author of "Data Modeling with Snowflake" and Product Success Lead at SqlDBM. I write about data modeling and cloud cost optimization.