Redshift data migration to a third-party service provider

Dilip Kola
Tensult Blogs
Published in
2 min readOct 29, 2019

We are addressing a customer use case where they have a large amount of the data in AWS Redshift and they want to migrate that data to a third-party service provider. Redshift gets new data through a daily ETL job and we need to run the migration process after that to push the newly updated data to the third party.

Redshift Data migration Architecture

Overall Migration Process

We are using Kubernetes Cronjob to check the Redshift table periodically for the new data. Once it is available, the cronjob issues Unload command to the Redshift cluster to extract the data from the Redshift table to the S3 bucket. After this step, the cronjob triggers the SpringBoot batch job on the SpringBoot App which is a Kubernetes service.

The SpringBoot app reads the data from the S3 bucket and calls the third-party service provider APIs to ingest the data. Both SpringBoot and the cronjobs leverage Amazon SNS to send the job-related notifications to the users. The new data gets available in the cluster only once per day so the Unload command gets executed once per day.

We have used Kubernetes cluster to run both cronjob and SpringBoot app, because it is easy to scale when load increases and to monitor what is running and deploy.

Components

Redshift: Primary data source for the migration.

Amazon S3: It is an intermediary data source in this architecture. We are using S3 for keeping data extracted from Redshift using the Unload command. Also, we store process tracking information in S3 so that we can resume the process from where it last stopped in case of failures.

We could have exposed Redshift directly to Spring boot app but instead we used the S3 as intermediary data source. The main reason for that is to reduce the load the Redshift cluster. Also, it helps us to create data lake architectures based out of S3 in the future.

Amazon SNS: Used to send success and failure job notifications to the users.

Cronjob: It is the controller of the migration process and invokes other components in order to complete the process.

SpringBoot App: It converts the data extracted from the Redshift to third party understandable format. And then it ingests the data using Spring batch jobs.

Conclusion

I hope now you know how to design an architecture for Redshift data migration. To get updates on my new blogs, please subscribe to our newsletter and also share your thoughts in the comments section.

Originally published at https://blogs.tensult.com on October 29, 2019.

--

--

Dilip Kola
Tensult Blogs

Spirtual Seeker | Mentor | Learner | Ex-Amazon | Ex-AWS | IIT Kanpur