Meet a new generation of Redshift Data Platform — RA3
At the last Re:invent 2019, AWS introduced us brand new Redshift cluster RA3. Let’s see, what is this and how can we benefit from it.
RA3 is a 3rd generation type of instance type for the Redshift family. Previously, we had either DS (data storage optimized) or DC (compute optimized).
RA3 has several new features:
- Managed storage
- High-speed cache
- High-bandwidth networking
To make a long story short, RA3 is offering us to scale compute and storage individually and pay what we really need. Sounds familiar? Yes, it is. Looking at the modern data warehouse industry, we understand, that it is one of the key benefits of the modern data warehouse, or should we call it a data platform?
We don’t want to spend time on the pros and cons of different vendors. The goal of this article is to highlight the new key features of the modern Redshift data platform.
Based on Redshift documentation, we can check the spec for all available nodes:
The minimum number of RA3 nodes is 2. So, this is the Redshift family for serious analytics. The smallest cluster allows us to have at least 128 TB of data. We played a bit with prices and it is obvious that with the new RA3 cluster type you will pay less and get more performance. We believe AWS will add more node types with smaller capacity and eventually replace existing DS and DC node types.
Based on $$ numbers, the advantage is obvious for you:
You can play with the numbers and see what you will get for your case. Don’t forget about the reserved cluster option that will allow you to get a significant discount. You defiantly will benefit, if you are currently running DS2.8XLARGE or DC2.8XLARGE cluster.
Let’s get back to the RA3 and review its features.
RA3 Managed storage
The main idea behind the RA3, that now Redshift will store all permanent data into S3. As a result, the local dick is treated as a cache. The data can be retrieved from S3 on-demand and Redshift tracks data “temperature” and keeps “hot” data local.
RA3 splits compute and storage and it means that we will pay for computing and for S3 storage (per GB).
Migration Path from DS/DC to RA3
There are two ways of switching from legacy clusters to the RA3. Each has pros and cons.
Firstly, you may restore it from a snapshot. In other words, we will create a new cluster.
Pros:
- Takes minutes
- No impact on reading
Cons:
- Endpoint will change
- The source cluster will available only for read operations to keep both clusters sync
- You will get the same number of slices
Secondly, you may use classic resize operation. In other words, we will modernize the existing cluster.
Pros:
- Same endpoint
- Less work to do
Cons:
- Cluster is available only for read operations.
Summary
RA3 is a true new family cluster that meets modern analytics standards and allows customers to run analytics at scale in AWS. If you currently are running on Redshift and experiencing issues with performance, ETL/ELT delays, lack of resources for the BI team, then you may try this new cluster.