AWS Glue: A Complete ETL Solution

Syeda Marium Faheem
Bazaar Engineering
Published in
2 min readJun 9, 2021

--

Image from istockphoto

Have you got a chance to think about how the biggest tech companies manage petabytes of data? How they build an effective pipeline that transfers data from OLTP to OLAP systems? How they minimize the time for data migration? What tools and skills are needed to build an effective and powerful data pipeline?

In this article, we are going to cover some emerging tools for big data with Amazon Web Services (AWS). For building our data pipeline, AWS has a robust ETL Solution called “AWS Glue

AWS is one of the biggest cloud service providers in the market.

What is AWS Glue?

Picture from AWS

AWS Glue is powerful, fully managed server-less ETL service.

Server-less ETL? What does that mean?

Server-less is a cloud computing execution model which does not hold any resources in memory. Customers only pay for the resources consumed by their applications.

So if you are looking for an robust and cheap ETL solution, AWS Glue is a good choice.

WHY AWS GLUE?

Data will talk to you if you are willing to listen

— JIM BERGESON

All the insights, analysis, and decision-making are impossible without proper useful data. When data is originating from multiple sources it becomes harder to do operations over it. Maintaining a historical record, backing up, joining across various sources becomes a challenging activity as well.

Picture from codeburst.io

Glue provides a unified data view using its feature called “Glue Catalog” and supports the super-flexible open-source Apache Spark.

Some of other nice features provided by AWS Glue are

  1. Data discovery and searching across datasets
  2. Automatic schema detection
  3. Enforcing schema for streaming data
  4. Cleaning and transformations for streaming data
  5. Replication of data to different sources
  6. Complex data pipelines
  7. Apache Hive support
  8. Workflows and job schedulers

This article is part of a series of posts related to AWS Glue and will be followed by another part with a tutorial on how to set up AWS Glue AWS Glue: Hands-on

--

--