Your Mega Guide to the 3 Stages of Netezza to Snowflake Migrations
IBM Netezza was an advanced on-prem appliance data warehouse. It has been described as being like a toaster — you wheel it in, plug it in, and it just works. It focused on highly performant SQL processing at scale.
Netezza has been the SQL data warehouse engine for many enterprises that are now starting to feel the constraints of an inelastic system designed for a pre-cloud world.
Recent pushes from IBM to end on-premise support for the Netezza appliance in favor of more expansive (and complex) public and private cloud offerings have been the final straw for enterprises that have seen other cloud data warehouses disrupt the legacy approach to SQL data warehousing.
We need a small moment of silence for the end of the “it just works” era of IBM Netezza.
This leaves clients in a tough spot.
How can I securely and quickly migrate to a modern and sustainable SQL data warehouse without massive risks and costs?
I often think of forced data platform migrations like what I see with Netezza as being similar to emergency airplane landings. Passengers were not expecting to have to land this plane but have no real choice and time is running out. Wouldn’t it be nice if you could have an experienced pilot at your side that has done this many times before to help you navigate the risky tasks ahead?
This is why clients of all sizes, including some of the largest organizations in the world, are working with Hashmap to safely chart their course from Netezza to next-generation data platforms.
This post will explain our battle-hardened, 3-pronged approach to successful migrations. This blueprint works reliably for Netezza, but it is also a recipe for success with other aging data warehouses such as Teradata, Greenplum, Vertica, Oracle Exadata, SAP BW, and many more.
Those stages are:
- Analysis and Tooling: Understanding the Past
- Migration and Validation: Migrating for the Present
- Modernization and Innovation: Building for the Future
While it is tempting to think of these stages as sequences in a simple migration cookbook, the best migrations work iteratively across all 3 stages. An agile approach with some deliberate understanding of end-users can accelerate value delivery, remove weeks of unnecessary migration of outdated workflows, and improve client adoption of their new data platform.
A Word of Warning
One candid disclaimer before we continue:
Successful Netezza migrations are achievable but they are not easy. Anyone that claims to easily migrate your entire infrastructure automatically and with no hard work is at best profoundly misguided and at worst intentionally trying to mislead you.
The thunderous stampede away from Netezza has attracted more than a few snake-oil vendors looking to make an easy dollar.
At Hashmap, we direct our obsession with building customer value into solving these challenges every day. We’ve been neck-deep in the most complex data environments in the world for nearly a decade, we know the approaches that work, and we have seen the failures caused by amateur enterprise re-platforming efforts.
We’re here to help.
On a recent Hashmap On Tap podcast episode, we discuss the different perspectives, approaches, and risks involved in Netezza to Snowflake migrations if you’re looking for a deeper dive into the subject. Listen here:
#35 Demystifying Netezza to Snowflake Migrations | Hashmap Podcast
Hashmap hosts, Kelly, Preetpal, and Randy, discuss IBM Netezza and the nuts and bolts of the wave of Netezza cloud migrations taking place today. www.hashmapinc.com
Stage 1 — Analysis and Tooling: Understanding the Past
This stage involves the least upfront commitment and sets the foundation for the success of all following stages.
We focus heavily on the following during this stage:
- Static analysis
- Automation estimates
- Effort sizing
- Modular migration piloting
By leveraging static analysis tooling, we are able to apply X-ray vision to existing systems. This gives us the ability to confidently determine which parts of your platform are good candidates for automated migration and which parts are likely to require veteran data engineering effort.
With proper analysis and automatability in hand, we are able to craft a custom Discovery Analysis and Effort Report for each migration. This represents the foundational findings from our analysis and serves as the roadmap for the entire migration. It is often a living document that is updated as assumptions are validated and new requirements are uncovered.
To test the validity of the analysis to this point, we recommend taking a representative subsample of the overall migration and getting hands-on with the target architecture. This is the best way to uncover those hidden hiccups laying it wait for all migrations and can further de-risk a migration effort.
Stage 2 — Migration and Validation: Migrating for the Present
This second stage is often the portion of migration that is most intuitive to clients. It is where the rubber really hits the road and where production use cases make their final departure from Netezza.
The heart of this process is described below:
Leveraging our trusty analysis results from stage one, we begin to iteratively migrate workloads to the new environment using a multifunction approach.
Replicate: create parallel pipelines for a discrete portion of the migration effort.
Validate: confirm the correctness of the migration against the existing platform.
Monitor: implement audit, alerting, and data quality checking that ensures the replicated pipeline maintains validity going forward.
Document: generate documentation covering this process for handoff and ongoing team reference.
Retire: bring replaced workloads offline.
The nuts and bolts of the replication process are below. This is the key to how we’re able to combine the efficiency of automation with the quality of custom-built translations:
- Group all workflows to migrate. This includes SQL scripts and the common XML outputs of more legacy ETL tools, like IBM DataStage, which is a popular tool for IBM Netezza users.
- Attempt automated conversion.
- Passing conversions are then validated.
- Failed conversions are isolated where data engineers find pattern-based approaches to solving the conversion issue. Often, a group of failed conversions shares a similar issue. For example, date extraction functions can have different syntax from system to system. By providing a general solution to the date extraction issue 1 time, the conversion can be reapplied to all scenarios automatically instead of having a data engineer fix this issue manually for all cases.
- The next round of workflows are grouped, including previously failing workflows, and the process continues until all conversions are successful.
An important note on this phase is that by including some upfront focus on users, it is common for a significant portion of workflows to be entirely retired without requiring conversion. Even 1 to 2 weeks of upfront design effort can and regularly does save months of work.
As workload migration progresses, it is common and expected that new efficiencies and advanced approaches will become available in the target architecture, leading to our third and final stage of Netezza migration.
Stage 3 — Modernization and Innovation: Building for the Future
This is the ultimate destination for all migrations. Once the immediate threat of losing support for a Netezza appliance is gone, the team can focus on finding advanced approaches in the new environment that improve existing workflows or represent net-new capabilities (Predictive Maintenance, Natural Language Processing, Process Automation, Real-time Decision Making, etc.).
Common modernization examples include:
- Removal of redundant and unnecessary operations due to updated business needs or constraints imposed by Netezza or legacy ETL tooling.
- Simplification of complex stored procedures, custom functions, and other advanced transformation using new services and approaches.
- Acceleration of transformation workloads by taking advantage of the elasticity and scale of the cloud.
Net new innovation efforts include:
- Single-pane-of-glass monitoring of data governance, cybersecurity, compliance, and cloud costs.
- Increased usage of Continuous Integration and Continuous Deployment practices to enable greater automation and more reliable production deployments.
- Development of advanced analytics infrastructure for artificial intelligence, machine learning, industrial IoT, and time series analysis applications.
The following reference architecture is a general-purpose data platform approach that we find useful when discussing these next-generation enterprise data capabilities.
This represents a comfortable abstraction that sets specifics aside long enough to provide a gut-level-feel of the major areas of focus when designing new business solutions. It is not intended to be used as strict architecture design and is intentionally simplified for use with a diverse group of technical comfort levels.
Ready to accelerate your digital transformation?
At Hashmap, we work with our clients to build better, together.
If you want to go beyond the blogs and get to the heart of how you can pour jet fuel into your data and analytics tank, please reach out and let us take it from here.
Also, you can catch Randy as a host on Hashmap on Tap, a podcast focused on all things data engineering and the cloud — available on Spotify, Apple Podcasts, Google Podcasts, and other popular audio apps.
Hashmap on Tap | Hashmap Podcast
A rotating cast of Hashmap hosts and special guests explore technologies from diverse perspectives while enjoying a drink of choice.
Other Tools and Content You Might Like
Snowflake Inspector | Interactive Visual Exploration of Snowflake | Hashmap
Snowflake Inspector Interactive, Visual Exploration of Your Snowflake Environment We've got something for you…
Snowflake Data Profiler | Statistical DQ Analysis for Snowflake | Hashmap
What if you could proactively discover data quality issues and get the answers to questions such as: "What records are…
5 Things I Wish I Knew Before Learning Snowflake
Hashmap has been focusing on delivering the simplest, most cost-efficient data solutions for the better part of a…
Hashmap Megabytes | Bite-Size Video Series
Hashmap Megabytes is a weekly video series in which mega cloud ideas are explained in bite-size portions.
Randy Pitcher is a Cloud and Data Engineer (and OKC-based Regional Technical Expert) with Hashmap providing Data, Cloud, IoT, and AI/ML solutions and consulting expertise across industries with a group of innovative technologists and domain experts accelerating high-value business outcomes for our customers.
Be sure and connect with Randy on LinkedIn and reach out for more perspectives and insight into accelerating your data-driven business outcomes or to schedule a hands-on workshop to help you go from Zero to Snowflake.