Oracle to Redshift

Migrating from Oracle to Amazon Redshift

Shafiqa Iqbal
5 min readMay 2, 2020

--

In this tutorial, I will be going through oracle schema objects and their alternatives in Amazon Redshift that can help you during database migration.

Migrating your Oracle data warehouse to Amazon Redshift can substantially improve query and data load performance, increase scalability, and save costs. Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all your data using your existing business intelligence tools. But not all oracle database features are supported by Redshift, so you need to look at the alternatives for them. So let me save you some time on this one.

First, look at the Oracle Schema Objects to be discussed.

Tables

Tables are the basic unit of data storage in a Database. Data is stored in rows and columns. The structure of the tables is the same in both Redshift and Oracle from a migration perspective. The only difference is the encoding, distkeyand sortkey columns within a table that are used by Amazon Redshift for parallel processing. The functionality of the table is the same in both.

Sequence

A sequence is an object in Oracle that is used to generate a number sequence. This can be useful when you need to create a unique number to act as a primary key. Amazon Redshift doesn't support sequences at the moment, but you can use these methods to auto-generate sequence values.

  • Number Tables: Create a single column (Type bigint), sequence number table. Get max( COL_NAME) value and assign it to a variable. Increment the variable value by 1. Insert updated value into the table. And finally, return updated value as a unique sequence number. You can find more detail here.
  • Identity Columns: An IDENTITY column in Redshift table contains unique auto-generated values. Redshift identity columns are not guaranteed to be incremental as defined by the identity skip value. But, it is guaranteed that the values will never collide (i.e. it will always be unique). The skip in value comes because of the distributed architecture of Redshift. To resolve this here’s a solution.

Index

In oracle, an index is an optional structure, associated with a table or table cluster. An index can

  • Speed up SELECT queries and reports
  • Reduce I/O
  • Fast Data Access
  • Prevent duplication primary and unique rows

But the downside is

  • Takes up extra disk storage
  • Slows down DML operations
  • extra load on database because of index maintenance

Indexes are not supported in Redshift.

You can use sorting and distribution style for the performance advantages of indexes. Redshift use zone maps for efficient data handling. If data is distributed evenly and sorted, the query analyzer will rapidly skip over large blocks of data (Redshift uses columnar storage with 1MB Block size) during table scans. This can boost your performance without the extra cost of maintenance and storage.

Partitions

Partitioning is a powerful functionality that allows tables to be subdivided into smaller pieces, providing you increased performance and availability in oracle. You can’t implement oracle styled partitioning in Redshift but choosing a suitable distribution style and distribution key will help you in query performance, storage requirements, data loading and maintenance.Redshift distributes the rows of the table to each of the compute nodes according to the table’s distribution style. By choosing the best distribution style for each table, you can balance your data distribution and significantly improve overall system performance, giving you the same benefit as partitioning does.

Views

Views are supported in Redshift and Create View command is used to create a new View. The View is not physically materialized, which means that there’s no actual table created in the database. Query Planner can face issues on optimizing queries in Views. You can overcome this issue by using Materialized Views.

Cursor

Cursors are supported in Redshift. You can use DECLARE command to define a new cursor and to retrieve a few rows at a time from the result set of a larger query. Cursor causes the entire result set to be loaded on the leader node. Because of the potential negative performance impact of using cursors with large result sets, AWS recommends using alternative approaches whenever possible. For more information, see Performance considerations when using cursors.

Packages

A package is a schema object in oracle that contains definitions for a group of related functionalities. A package includes variables, constants, cursors, exceptions, procedures, functions, and subprograms. A package in Redshift is used to create a UDF. A package can be created in Redshift with certain limitations:

  • You can create an only custom scalar user-defined function (UDF) using either SQL select or Python
  • You create a module and upload it on S3 which you will add to your redshift cluster by executing CREATE LIBRARY command.
  • You cannot package together different schema objects in redshift library feature like you do in oracle.

Function

Amazon Redshift supports a number of functions that are extensions to the SQL standard, as well as standard aggregate functions, scalar functions, and window functions. You can also write custom scalar user-defined function (UDF) using either SQL select or Python.

Procedure

Stored procedures are supported in Redshift.

Stored procedures are commonly used to encapsulate logic for data transformation, data validation, and business-specific logic. By combining multiple SQL steps into a stored procedure, you can reduce round trips between your applications and the database.

Triggers

Redshift does not support triggers because it’s a data warehousing system that is designed to be able to import large amounts of data in a limited time. So, if every row insert would be able to fire a trigger, the performance of batch inserts would suffer The trigger type of behavior should be a part of business application logic that runs in OLTP environment and not the data warehousing logic. If you want to run some code in DW after inserting or updating data you have to do it as another step of your data pipeline. Credits

Summary

In this tutorial, we did a comparison between schema objects of Redshift and Oracle and which objects are available in Redshift and which are not.

--

--