Data Migration from Netezza to Snowflake
Overview of Netezza
Netezza started its life as the world’s first data warehouse appliance in 2003 and for years, it maintained this highly desirable position as the world’s first 100 TB data warehouse appliance in 2006, and the world’s first petabyte data warehouse appliance in 2009.
In its heyday, Netezza’s performance was unmatched thanks to its patented hardware acceleration process in the form of field-programmable gate arrays (FPGA) which were fine-tuned to process analytic queries at blistering speed and at scale. These FPGA handled the data compression, row-column conversion and data pruning.
Different models of Netezza were produced during its lifetime including: Skimmer, TwinFin, Striper and Mako. Its value proposition includes the following:
- No indexing and partitioning of data
- Simplified management
- Data pruning
- Purpose-built for data warehouse and analytics
However, IBM withdrew support for Netezza largely due to the cloud revolution. Most of the models are no longer supported by IBM and no new Netezza appliance will be made. In fact, no new hardware had been released since 2014. Effectively, IBM is forcing Netezza users to abandon the appliance by removing crucial support for the product.
Cloud Computing
Cloud computing is the on-demand delivery of computing services — servers, databases, storage, networking etc over the Internet to offer faster innovation, flexible resources and economies of scale.
Why is Cloud Attractive?
- Lower cost — the pay-as-you-go model is an attractive pricing model for both start-ups and enterprises. Users with variable computing needs can realise significant savings in their IT costs.
- Increased reliability and uptime — Most cloud providers offer server availability in excess of 99.9%.
- Scalability — computing power can be ramped up or down at the push of a button.
- Speed of deployment — companies can develop and deploy applications by acquiring almost instant access to virtually unlimited computing power and storage.
- Cheap and virtually unlimited storage.
- Economies of scale — the more companies share cloud resources the more the costs can be amortised for each company.
- Increased security — Most cloud providers comply with industry standards like: HIPAA, PCI, ISO, Sarbanes-Oxley etc.
- Disaster recovery — cloud computing allows for easy and automatic backup of data and enable faster disaster recovery of critical IT systems.
Why Snowflake?
Snowflake is the only data warehouse built for the cloud. Snowflake offers far better performance, scalability, resiliency and workgroup/workload concurrency than any other cloud-based data warehouse in the market and offers a safe haven for Netezza refugees.
By separating storage from compute, Snowflake can automatically and instantly scale in a way not possible with Netezza. This is achieved with Snowflake’s unique multi-cluster, shared data architecture. Snowflake’s value proposition includes the following:
- Zero management — no knobs to tweak and no tuning necessary.
- All of your data in one place — stores both structured and semi-structured data (JSON, AVRO, XML).
- Unlimited access for concurrent users and applications without performance degradation.
- Pay as you go and pay by the second.
- Seamless data sharing (Data Sharehouse).
- Complete SQL database.
Netezza to Snowflake Migration Strategy
There are two approaches to migrating data from Netezza to Snowflake:
- Lift and Shift
- Staged
Generally speaking, choosing between the two approaches will largely depend on factors including: timescale, number of data sources, types of data and future ambitions.
Factors that may lead to a lift-and-shift approach:
- Highly integrated data across the existing data warehouse.
- Timescale pressures to move off Netezza.
- Migrating a single standalone and independent data mart.
- Well-designed data and process using standard ANSI SQL.
Factors that may lead to a staged approach:
- Current data warehouse is made up of many independent data marts that can be moved independently over time.
- Re-engineering is required for critical data and processes within the data warehouse due to performance issues.
- The focus is on new development rather than reworking legacy process.
- Change to the data ecosystem is required e.g. new ELT, BI and data visualisation tools.
Migration steps:
In order to migrate your data warehouse from Netezza to Snowflake successfully, a logical plan must be developed and adhered to. The plan includes the following:
Data model migration
The first step in the data migration process is the migration of the data model. This model includes the databases, tables, views, sequences, user account names, roles and objects grants. At minimum, the Netezza database owner user must be created in Snowflake before migration can start.
The objects to be migrated will largely depend on the scope of the migration. There are three ways of migrating the data model from Netezza to Snowflake:
- Using a data model tool — if your data warehouse design is stored in a modelling tool, you can use the tool to generate ANSI SQL compliant DDL to rebuild database objects in Snowflake.
- Using existing DDL scripts — with this, you may need to change some data types.
- Creating new DDL scripts — by extracting metadata from Netezza with nzdumpschema or NZ_DDL utilities. You may need to change some data types.
Dataset migration
After migrating the data model to Snowflake, you can start to migrate historical data in Netezza to Snowflake. Depending on the size of data, a third party ETL tool can be used to move the data for small datasets.
For 10s or 100s of terabytes of data however, a more practical way is a manual process. It involves using a data transfer service like AWS Snowball/Snowball Edge or Azure Data Box. Also, with if you have a direct connection to AWS or Azure, software like IBM Aspera software can also help to transfer the data. To move petabytes or exabytes of data, AWS Snowmobile or equivalent is the logical way to go.
To move data manually, data for each Netezza table must be extracted to one or more delimited flat file (e.g. CSV) using External Tables to create single files on the NZ_UNLOAD utility to create multiple files in parallel.
Snowflake PUT command is then used to upload the extracted files into either a cloud storage internal or external bucket. Snowflake recommends file sizes between 100MB and 1GB for faster database load through parallel bulk loading.
Once data have been moved to the cloud storage service of your preferred cloud provider, loading data into the database can begin with the Snowflake COPY command.
Queries and Workloads migration
- Data query — Most of your existing Netezza queries can be migrated easily without change due to Snowflake’s ANSI-compliant feature. The few Netezza constructs that are not ANSI-compliant can be changed manually.
- BI tools — Snowflake supports both ODBC and JDBC so migrating BI tools should be straightforward.
- Workload management — Snowflake’s multi-cluster architecture makes this Netezza feature redundant.
ETL and Data Pipeline processes migration
Snowflake supports both ELT (Extraction, Load and Transform) and ETL (Extraction, Transform and Load). It however is highly optimised for ELT.
There are native connectors for popular ETL tools like: Ab Initio, Talend, Informatica etc. It is recommended that data pipeline should be run in both Netezza and Snowflake during the initial migration for quick comparison.
For loading data continuously as it arrives in your cloud storage facility, the Snowpipe tool can be leveraged for its impressive performance.
Cut-Over
After migrating the data model, data, queries etc to Snowflake, plans must be made to switch from Netezza to Snowflake permanently using the following steps as a guide:
- Perform a one-time extract and load of historical data via Snowball/Snowball Edge/IBM Aspera.
- Set up delta (incremental) loads of new data.
- Inform all Netezza users about the impeding cut-over.
- Check in and back up all development code into code repository e.g. Github, Gitlab, Bitbucket etc.
- Re-point all Business Intelligence (BI) reports to pull data from Snowflake.
- Run Netezza and Snowflake in parallel for a couple of days then compare and validate the output.
- Turn off and decommission Netezza.
Netezza to Snowflake Data Migration Consultancy
Slalom is a modern consulting firm and Snowflake’s solution partner with extensive experience in Netezza to Snowflake migration. With over 200 certified Snowflake consultants, Slalom offers consulting services ranging from project management to implementation and delivery.
The expertise of Slalom in Netezza to Snowflake migration is unmatched. This expertise was recently demonstrated when Slalom helped Sony Interactive Entertainment (SIE) deliver the biggest data migration globally to date from Netezza to Snowflake on time and on budget.
Sony Interactive Entertainment (SIE) captures data from users around the world. Data dating back to 2006 was stored on an on-premise IBM Netezza platform. The data is used by customers across the SIE business to gain insight to drive business performance and decisions.
The legacy platform was reaching capacity, both in terms of storage and processing power. As a result, performance had become an issue and ongoing interventions were required to “keep the lights on”. Significant investment was required to upgrade the current solution to support long-term business needs.
Slalom led the team responsible for planning, managing and successfully executing the data migration. This resulted in all Sony user data being transferred from the legacy storage system to Snowflake, a cloud-based enterprise data warehouse. The 10-month migration project was completed requiring only two days of disruption to the end users, having almost no impact on business as usual.
Project Highlights
- Started with a Discovery phase, followed by four further Delivery phases
- Delivered a successful migration, going live 10 hours ahead of schedule
- One of the largest full data migration to Snowflake, world-wide
Migration statistics:
- SIE went live with Snowflake with 322 TB of data on day 1
- 35 TB of data transfer was achieved over a 24 hour period
- Size of the largest tables: 21 TB
- 10 trillion rows of data were migrated, across:
- 30 production databases
- 58,000 database tables
- 20,000 database views were migrated
David Oyegoke is a Data & Analytics consultant based in Slalom’s London, UK office.
Want to learn more about how Slalom can support your organisation in migrating from Netezza to Snowflake? We’re co-delivering a Webinar with Snowflake to dive deeper into the topic on 27 May 2020.
Register here: https://www.snowflake.com/webinar/a-move-to-the-cloud-a-data-warehouse-migration-from-netezza-to-snowflake/
You can also speak with David and other Data & Analytics leaders at Slalom by reaching out directly or learn more at slalom.com.