AWS ETL Tools in a nutshell

Overview of AWS ETL tools for different business scenarios.

Deeksha Kukreti
AWSLearning
2 min readJul 19, 2023

--

There are multiple ways the data could be ingested and transformed from multiple sources on-premise or legacy to the AWS Cloud. AWS provides multiple services according to different business requirements and use cases.

AWS DMS

Commonly known as Amazon Data Migration service, is a managed and replication service used to migrate data from different heterogenous and homogenous systems to AWS quickly and securely.

It is most commonly used to migrate legacy or on-premise databases to AWS. The most important benefit is it is easy to build at a one-tenth of the cost.

AWS EMR

This AWS service has high throughput and should only be used incase of big data analytics. It has ability to support framework like Apache Spark, Hive or Presto. Although it is beneficial for handling huge amount of data, the business still need skilled resources to develop it.

AWS Kinesis

It is used to ingest real time data from data sources. Further sub-categorised into following components:

  1. AWS Kinesis Firehose — The ingested data from the AWS Kinesis Data Stream is fed into AWS Kinesis Firehose which further connects to storage units like AWS S3, AWS RDS, AWS Aurora, AWS Redshift etc. used to build data lakes, data warehouse or analytics service. It is important to note that this is a data delivery service used for extract, transform and load service.
  2. AWS Kinesis Data Stream — This service connects with the source system to capture data. It connects with sources like Microservices, logs and other AWS services to capture data and deliver to AWS Kinesis Firehose or Lambda function.
  3. AWS Kinesis Data Analytics — It provides easy way for any real time data analytics to gain insight. It provision transformation and analytics feature when queried.

AWS Glue

AWS Glue is a built-in ETL tool provided by Amazon. It has various components like AWS Glue Crawler, and Glue jobs etc. that can be used to build ETL jobs to ingest data from on-premise systems or Cloud environment to AWS. Although it is very efficient, it has own drawbacks. In case the business is concerned about the cost, then this service might be at a higher end for ingesting huge amount of data.

Some most commonly known use cases are to convert .CSV or other row-based file formats to column-based file format like Parquet using Glue Crawler. Another use case is where Glue data catalog is organised into databases and tables to provide a logical structure to store and maintain metadata.

AWS App-flow

In order to build bi-directional data flow between SaaS platforms like Salesforce, Google Analytics, SAP into AWS like S3, Redshift, AWS App-flow can be build in few clicks.

Thank you for your time reading it. If you feel like leaving feedback, then refer to the comment box.

--

--

Deeksha Kukreti
AWSLearning

Technology Enthusiast | Data Architect | Scientist | 2 X AWS Certified | Microsoft | Data Wizard