Migrating from Oracle Exadata by Modernizing with Snowflake’s Data Cloud

How to turn off Oracle Exadata to accelerate time to value, lower costs, and cut admin overhead

--

by Preetpal Singh

The days of simply lifting and shifting data warehousing and data analytics workloads to the cloud have passed. And if Snowflake’s record-setting software IPO is any indication, everyone is intensely focused on using Snowflake and the surrounding data integration and data automation ecosystem to modernize their overall approach.

Over the last three years, we’ve helped companies across industries adopt Snowflake’s multi-cloud (AWS, Azure, and GCP) SaaS solution by modernizing (and migrating) from traditional data warehousing appliances and platforms such as Teradata, Netezza, Hadoop, and Exadata.

When you consider traditional data warehousing appliances, Oracle Exadata has the potential to present a unique challenge in that Exadata can run mixed workloads — both OLAP (traditional data warehousing) and OLTP (transaction processing) database processing.

The Snowflake Data Cloud is a recommendation for running any data warehousing and data analytics workloads in the cloud — It Just Works! Of note is that Snowflake’s not a transactional database and it isn’t designed to run your OLTP workloads for which we recommend using something like AWS Aurora if you are on an AWS stack.

With this blog post, we’ll help you think through options for retiring your Oracle Exadata OLAP workloads completely by leveraging the Snowflake Data Cloud for the data warehousing side to accelerate time to value, lower costs, and cut admin overhead. In future posts, we may explore using AWS services such as Aurora to migrate your transactional workloads (if that is a requirement and you are running both workloads on Exadata today).

Don’t hesitate to bring in an outside team, like Hashmap, that has this as a core competency. We can help not only stand up the warehouse (which is deceptively easy), but we can help you configure it in ways to anticipate the future while keeping costs low.

Hashmap’s Data & Cloud Migration and Modernization Workshop is an interactive, two-hour experience for you and your team to help understand how to accelerate desired outcomes, reduce risk, and enable modern data readiness. We’ll talk through options and make sure that everyone has a good understanding of what should be prioritized, typical project phases, and how to mitigate risk. Sign up today for our complimentary workshop.

Background on Oracle Exadata

Oracle Exadata was first introduced at Oracle Open World in 2008 and ran Oracle’s 11g Release 1 version of the Oracle database on HP hardware with an intelligent storage subsystem designed to optimize data warehousing workloads and compete as an appliance against Netezza and Teradata.

Once Oracle bought Sun Microsystems and the platform infrastructure shifted from HP hardware to Sun hardware, Smart Flash Cache was introduced in Exadata with 11gR2 and customers were able to run OLTP workloads as well. If you are interested in the history of Exadata, flashdba.com provides a well-written history of the Exadata platform and ends the story with an interesting question:

The question is… does this actually make sense? Is it realistic for Oracle to claim that Exadata is the best-fit for all of the different types of database workloads? If Oracle sees Exadata as the only platform for database customers, how can customers be sure that they are buying a solution which properly fits their requirements? And indeed, is it actually a linguistically correct statement to claim that a product is optimized for every possible workload?

The Case for Modernizing

For any current Exadata customers, an opportunity to modernize your data stack presents some unique questions to consider:

  1. Can I accelerate my move to a cloud-based “as a service” solution like Snowflake where I can pay by the second for the services that I use?
  2. How much will I decrease my overall time to market for delivery of data “products” to key stakeholders?
  3. Where will I be able to unlock opportunities to do more with the new data economy and an ability to enable data sharing, optimized data science, quicker data lake insights, easier data exchange, more streamlined AI/ML apps, etc.?

Considering both OLAP and OLTP Workloads for Exadata

Based on my experience in working with a wide variety of Oracle-based environments over the years, Oracle databases can be broadly classified as supporting either analytics applications or transactional applications. While the Oracle database can be the centerpiece for both types of applications, it is tuned differently depending on the use cases — the same logic applies to Exadata as it runs the Oracle database.

  1. Oracle Exadata — Analytics focus: Tuned with a larger number of IOPS, larger block sizes, bigger SGA, PMON sizes, and the role of Parallel Query Server processes.
  2. Oracle Exadata — OLTP focus: Tuned for faster IOPS, smaller block sizes, and row-based operations with transactions implemented

Below is a picture of the internal areas of an Oracle database showcasing aspects of the tuning:

https://docs.oracle.com/cd/B19306_01/server.102/b14220/process.htm

When setting up your plan to migrate to cloud-based services, it would be a stretch to propose the same type or instance of a database to do both flavors of workloads — there’s really not a reason to do that anyway. The cloud has made it very easy to select a specific type of service for a specific workload and pay “by the drink” on a consumption-based model versus trying to force a platform into multiple workloads while providing a “lowest common denominator service” for each type of workload.

So, when migrating to the cloud, selecting the right platform for analytics versus transaction processing is a luxury (you have a lot of potential services to choose from and you are not forced down a single path), but it’s also extremely important to think through.

Given the option for purpose-built distinct database services for OLTP and OLAP, the following questions and challenges present themselves:

  1. Can I also solve for all 3 major public clouds — AWS, Azure, and GCP? Hint: there is more than one approach for your “Exadata to Snowflake” migration.
  2. What services should I select based on my specific use case/s? Hint: there is no “one size fits all — every on-prem environment is unique and the cloud-based services should have the flexibility to accommodate that uniqueness.
  3. While it might be technically feasible, how do I avoid a lift-and-shift approach for Oracle Exadata? Hint: Oracle Exadata can be lifted to the cloud but this is a less desirable solution because Exadata is engineered as an appliance combining compute, storage, and operating system services. Snowflake, on the other hand, separates compute (virtual warehouses) from storage and from cloud services providing a tremendous cost and performance advantage.

Migrating from Exadata to Snowflake: OLAP Data Warehousing Workloads

The most common usage pattern we see for Exadata is traditional OLAP and data warehousing workloads. I’m going to cover OLTP workload migrations in a separate post, but I’ll start below with specific OLAP workload migration recommendations including the data platform, data migration, data acquisition, data transformation, and BI that will assist in thinking through moving your data warehouse off of Exadata to the cloud.

Migrating from Exadata to Snowflake for OLAP: The Data Platform: Snowflake

An OLAP architecture is primarily focused on data warehousing with an additional aspect of cubes for multi-dimensional views, querying, and analysis. It’s common for cubes to be needed in traditional data warehousing environments because of computing limitations on dimensional metrics and measure analysis typically resulting in the need to use two OLAP models: Relational Online Analytical Processing (ROLAP) and Multidimensional Online Analytical Processing (MOLAP).

Simply put, a ROLAP model accesses data directly from the data warehouse and a MOLAP model accesses data directly from multi-dimensional cubes.

To cut to the chase for the recommendation on the OLAP data platform, choose Snowflake to completely remove these computational challenges and drastically simplify your data warehousing and data lake strategy from both a development and support standpoint.

The “why” is really straightforward — as mentioned earlier, it just works, and you’ll get the following:

  • One Platform, One Copy of Data, Many Workloads (independent compute)
  • Secure & Governed Access to All Data
  • Virtually Unlimited Performance and Scale
  • Near-Zero Maintenance, as a Service
  • It’s SQL
  • Consumption-Based (pay only for what you use)
  • All Data & Data Sharing
  • Multi-Cloud (AWS, Azure, and GCP)

Migrating from Exadata to Snowflake for OLAP: Data Migration: Options Abound

We find that it doesn’t take clients a lot of time to decide on Snowflake for the OLAP/Data Warehouse platform for the reasons mentioned above, but a data warehouse migration from Exadata typically also involves transforming the data acquisition and data transformation pipelines with consideration for the potential impact on the consumption layer (think BI) as well. I’ll touch on all three aspects below starting with data acquisition.

Data acquisition in most enterprises is typically very mature with established ETL frameworks (Informatica, Talend, DataStage, ODI, etc.) and/or a series of custom shell scripts ingesting data from a variety of source systems including packaged ERP applications (SAP, Oracle EBS, etc.), mainframe applications, and a multitude of systems of record running relational databases such as SQLServer, Oracle, DB2, or similar. The following are some outlined options for data migration re-platforming for the cloud:

Option 1:

At least initially, it’s a simple, solid, and cost-efficient approach to pipe the output of these data acquisition workflows to an external stage. For example, you could connect to an AWS S3 bucket using AWS Direct Connect in conjunction with AWS Data Sync which provides a dedicated network connection and eliminates or automatically handles many of these tasks, including scripting copy jobs, scheduling and monitoring transfers, validating data, and optimizing network utilization. There is also the availability of AWS Data Migration Services but it is known for some limitations which are beyond the scope of this discussion.

Option 2:

There are also data integration solutions like Qlik Attunity Replicate and HVR which do a good job of replicating data and continuing to sync changes from packaged applications such as SAP and Oracle EBS.

There are some stumbling blocks to using these types of data replication tooling, e.g. replication of data behind a view, distributed log shipping, or encrypted redo log files.

Option 3:

In the long run, in order to reduce the total cost of ownership of the overall solution and reduce data pipeline development time, it’s worthwhile to explore modern data stack solutions and services such as Fivetran, Matillion, and Talend Stitch which do really well with cloud-based data sources.

For instance, if you are using Salesforce, Twitter, LinkedIn, and Facebook as part of your Digital Marketing portfolio, chances are, out of the box connectors exist for these apps and many more making sourcing data or migrating data simply to set up and maintain.

Option 4:

Use open-source frameworks for the purpose of data syncing. An example of this approach is Debezium which is licensed under Apache 2.0. Note this will also help you set up a near real-time replication architecture with Kafka. The choice of using a framework such as this depends on your corporate strategy, support requirements, etc. Generally, these frameworks are also pretty robust but they do not provide a full glossary of supported sources like Option 3.

Option 5:

A homegrown utility such as Hashmap Data Migrator (hdm) for the purpose of one time data sync using underlying platform available choices. This utility is a simple wrapper around available services written in Python and saves on coding, license, and support cost. Again the choice of these options is dictated by what you would like to do.

I recommend looking at these types of solutions to reduce maintenance costs due to schema evolution and upstream codebase management, as well as significantly limit the technical debt of non-functional requirements, data modeling, etc. that data warehouse owners can carry for source system data ingested into the data warehouse landing area.

Migrating from Exadata to Snowflake for OLAP: Data Transformation

For existing Exadata data warehouses, data transformation typically consists of:

  1. Building workflows and transformations with ETL tools such as ODI, Informatica, Talend, Datastage, Pentaho, etc.
  2. Writing PL/SQL code to be used in conjunction with cron jobs or legacy scheduling solutions such as Control-M.

Today, with unlimited compute available on a pay-for-what-you-use model with Snowflake, it’s common to use an ELT approach (vs traditional ETL) and let Snowflake do the transformation work rather than traditional approaches of ETL. This is also important given that in a cloud environment data crossing software boundaries (i.e. VPCs and zones) costs money. It is relatively cheaper to send the compute down to the warehouse where data is stored.

Option 1: dbt and Snowflake

Using an ELT approach does present a few challenges, the biggest being the need to take a more formal approach to DevOps/DataOps along with dependency management. These concerns can be addressed through using frameworks like dbt which uses a SQL model-based approach to help analysts take ownership of the entire analytics engineering workflow, from writing data transformation code to deployment and documentation. dbt formalizes your descriptive elements of building a data pipeline such that lineage, provenance, reusability modularity of your code is maintained all the while doing ELT workflows.

Below is an example of this approach with a recent post from another Hashmapper:

Additionally, to address the workflow scheduling and management gap, Apache Airflow can be a good orchestration framework for ELT pipelines written with dbt to give advance scheduling capabilities for an end to end orchestration of data pipeline from source to target and beyond, e.g. derivative machine learning pipelines, etc. Some legacy schedulers are also available such as Control M.

Option 2: Fivetran or Matillion with dbt & Snowflake

Some organizations are not as equipped to work in a powershell/power SQL type environment and would prefer to use a modern approach that also provides an elegant user experience such as Fivetran or Matillion. While this is a great approach there can be some gaps with expressing advanced data transformation logic however data acquisition becomes extremely straight forward especially in the context of modern API-based sources, e.g. Salesforce.

These concerns around robust data transformation pipelining can be addressed through using frameworks like dbt which take care of data transformation activity and defer scheduling, data acquisition, code, pipeline, and failure management to Fivetran or Matillion.

Option 3: Databricks & Snowflake

Another option to consider for data transformation is Databricks which can be a good ELT framework for Snowflake but requires careful consideration around whether distributed processing is really needed given that most/all transformation logic is being pushed down to Snowflake. Additionally, in data-intensive operations, the knowledge of the VPC in which Databricks viz-a-viz Snowflake runs is important to avoid too much data movement.

And with the power of it being a cloud-native service, Snowflake plays an elegant role as a distributed compute engine with data never moving around or leaving the service.

Databricks is an excellent choice as well because a lot of existing workload can be quickly modeled into Spark-based pipelines with Python being a heavy lifter from a sequencing of steps perspective.

Option 4: Current ETL (Datastage, Talend, Informatica, ODI, etc.)

Existing workflows on Exadata would typically be using an ETL tool like Datastage, Informatica, Talend, ODI, etc. There are some solutions that are heavy Oracle PL-SQL/Stored procedure based with some orchestration solution thrown on top of it. The idea to re-platform those or keep the investments in them intact when you move to the cloud depends on time to market, appetite for risk/change, and overall cost.

One of the offered approaches is to convert them over to SQL native approaches with pushdown into Snowflake. Here are some considerations and questions to keep top of mind when migrating data pipelines and existing workflows using ETL tools or PL/SQL code to Snowflake SQL with an orchestration framework thrown on top of it:

  1. Can you remove legacy and possibly obsolete ETL code or collected technical debt that has accumulated during the lifetime of the existing solution?
  2. How will you convert Oracle PL/SQL to Snowflake SQL?
  3. What about the migration of cursor-based, sequencer, record by record processing logic?
  4. How about Oracle-specific functionality like join syntax, rowid, some statistical functions, use of dual, etc.?
  5. Have you taken into consideration data type formatting?
  6. Is there an option to convert ETL tool based mappings to Snowflake SQL capturing workflow dependencies while using a more modern framework?

Each of the thoughts could be a post in and of themselves with detailed elaboration on each topic. I do believe there are some automation opportunities available to help address the 6 points above, but no code conversion tool is ever going to be 100% due to the reasons below:

  1. Auto migration frameworks work on the principals of transpilers/compilers which look at grammar or XML based DSL (of source ETL tools) pipelines and convert them to a different grammar or XML based target DSLs. Essentially, they convert what is coded albeit with some exceptions.
  2. ETL systems built over time tend to collect a lot of technical and business debt due to changes in the ER model. If the ER models are tweaked, then ELT/ETL frameworks will require re-coding. Automation tools cannot handle this.
  3. Testing and validation of the outputs of an ELT code conversion have to be tested with test data setup and validation.

Therefore an auto-migration framework will accelerate some parts of the migration process and we have those available in the Hashmap migration toolbag when needed and when appropriate based on requirements. More thoughts will come in another post, however, the migration to an ELT approach with SnowSQL can be accelerated with the proper use of frameworks, people, and technology.

Scenario A

Existing ETL tools such as Informatica, ODI, Datastage, Talend, Pentaho migrated to equivalent tools like Matillion, Informatica Cloud, etc.

Steps involved would be:

  1. Take an export of the mappings in XML format.
  2. Run them through an XML converter framework for ETL migration projects such as commonly provided in the industry to:
  • Make changes to embedded SQL within the XML based mapping to SnowSQL.
  • Make changes to data types for adjusting it from Oracle Exadata to Snowflake.
  • Generate equivalent importable XML mappings for the converted mappings.

Scenario B

Existing tools migrated to PL/SQL with orchestration wrappers written in Python or DSLs.

Steps involved would be:

  1. Take an export of the mappings in XML format.
  2. Run the XML export to produce corresponding PL/SQL scripts which are then constructed into a dependency graph by choice of a Python script or others.
  3. The process is similar to migration from ETL tool “a” to ETL Tool “b”, however, the logical break in the conversion process is important to reflect on.

Simple ETL mappings can be run through this process without many hiccups but some medium and most complex will likely need to be hand converted rather than assuming an automation approach will suffice.

Migrating from Exadata to Snowflake for OLAP: Business Intelligence

A third dimension to consider when migrating OLAP Data Warehousing Workloads from Exadata and modernizing with Snowflake is the existing BI environment. The type/s of BI tools and the location (where is the BI tool running) are important for example:

  1. If the BI infrastructure is kept on-premise in a data center (e.g. Phoenix) but accessing Snowflake instances in, for instance, AWS East Region in Virginia.
  2. All BI tools are not built the same in terms of metadata management, caching, and logical models translating to physical database models.

When performing migrations these facets will play an important role around configuration and performance tuning in the cloud-based Snowflake environments. Generally, we recommend making as few changes as possible in the BI tool environment because many times there will be embedded logic for data presentation, hierarchies, rollups, security, etc.

For your initial migration and modernization phase, BI Migration is an 8 step process beginning with an Assessment, followed by:

  1. Run a profiling tool against existing metadata repositories for the BI tools, e.g. Microstrategy, Tableau, OBIEE, SAP BO, Tibco Spotfire.
  • How many universes, objects, and reports exist? How many are actively used? How many can be archived?
  • How large are the reports?
  • Are there unpurged reports? If so, how many?
  • Are there duplicate reports with different names and/or locations?
  • When do session peaks occur?
  • Is it necessary to migrate schedule reports often saved in BI tools repository?
  • What is the security model?
  • Who is connecting and how often? Who is not connecting?
  • How many instances can be archived or deleted?
  • Are there document scheduling jobs? Check the destination folder locations; are they still valid?
  • How big are the queries?
  • How many data sources exist and are they all in use?

2. Cleanup, optimize, and back up the current environment into (multiple) repository export files using BI provided tooling for backup and restore

3. Restore specific repository backup files into the cloud environment

4. Perform automated SQL changes for SQL overrides to migrate from Oracle Exadata to Snowflake

5. Change the database connect string for universes/dashboard in the metadata repository

6. Adjust caching settings to tune for live queries vs cached queries

7. Run regression test reports

8. Perform environment & query performance optimization

When moving to Snowflake, you should be able to turn BI caching off and enable the BI tool to generate queries directly to Snowflake. This will use Snowflake’s data caching capabilities to reduce overall costs.

Final Thoughts

Amazon published a story about turning off their final Oracle database and migrating their entire consumer business from predominantly an Oracle database environment to the AWS cloud. They realized significant benefits, not the least of which was getting to refocus on high-value work versus keeping the lights on operationally, constantly trying to scale and provision inefficient hardware, and always dealing with Oracle licensing.

While you may not be looking to migrate 1000s of Oracle databases like AWS did, I hope that if you are considering migrating one or multiple Oracle Exadata appliances running OLAP workloads that you now have some options to get started on your approach and plan. Hit the easy button and use Snowflake for your cloud data platform.

In the future, I plan to go over the steps to perform these migrations (implementing the above architectures) broadly covering the following points:

  • Types of migrations
  • Cutover migration: strategy and implementation plan for initial data migration
  • Incremental migration
  • Strategy and implementation plans for: schema migration and validation, data replication, migrating Oracle SQL constructs from Exadata to Snowflake, analytics infrastructure, data validation and quality checking, real-time infrastructure, ETL rewrite to ELT, and modernizing BI.

Lastly, if you’d like assistance in design, architecture, and engineering to make your move from an on-premise Oracle Exadata to the cloud and Snowflake, don’t hesitate to reach out.

Hashmap offers a range of enablement workshops and assessment services, cloud modernization and migration services, and consulting service packages as part of our Cloud service offerings. We would be glad to work through your specific requirements.

Other Tools and Content You Might Like

Preetpal Singh is an Enterprise Architect and Co-Founder of Hashmap, an NTT DATA Company, delivering innovative, outcome-based data and cloud consulting services across 20 industries. You can connect with Preetpal on LinkedIn and also catch him when making guest appearances on Hashmap on Tap.

--

--