AWS Cloud Quest — Data Analytics Role

Filipe Pacheco
6 min readMar 26, 2024

--

Hello Medium Readers, how are you? I hope you are well. If you have been following my series of posts here on Medium, you already know that I’ve returned to delve deeper into Machine Learning after a 6-month journey in the Cloud and DevOps realm.

I’m excited to share that I’ve accelerated some of my development plans and have dived back into learning more about Reinforcement Learning, a particularly fascinating technique in Machine Learning that empowers self-driving cars and agile robots like those developed by Boston Dynamics. Perhaps this will become a topic for future posts here on the blog.

However, as this lies in the near future, today I’d like to share with you my experience in the AWS Cloud Quest, available on the AWS Skill Builder platform, specifically focusing on the Data Analytics role. This won’t be my initial post about the Cloud Quest, so I encourage you to take a look at that one as well.

As I explained in that previous post, the Data Analytics role comprises 23 role assignments, some of which were covered in my earlier post due to the Cloud Practitioner role. Certain assignments within Cloud Quest overlap, leading to my coverage of 16 new assignments.

Now, I’d like to share my thoughts on the Cloud Quest. It took me almost 15 hours to complete all assignments, and I believe AWS has executed the Cloud Quest concept admirably. It effectively teaches you how to navigate and utilize the services through repetitive practice but within varied contexts. The emphasis on repetition is key.

At the end, I receive the following badge :D

AWS Cloud Quest — Data Analytics Badge.

Without further ado, let’s delve into the Data Analytics Role.

Data Analytics Role — Services Used

On this section, I bring a tiny explanation and how I used the 13 AWS’s services that I used to complete the Data Analytics Role.

Lambda Function:

  • AWS Lambda lets you run code without provisioning or managing servers. It executes your code in response to triggers and automatically scales to handle the load.
  • Used for various tasks such as creating serverless applications, decoding data, event-driven processing, and automation tasks across different episodes.

S3 Bucket:

  • Amazon Simple Storage Service (S3) provides object storage through a web interface. It’s designed to store and retrieve any amount of data from anywhere on the web.
  • Used as a data lake storage layer, for event notifications, storing ingested data, and as a source for various AWS services like Glue and Redshift.

EventBridge:

  • Amazon EventBridge is a serverless event bus service that makes it easy to connect applications together using data from your own applications, integrated Software-as-a-Service (SaaS) applications, and AWS services.
  • Used for configuring event notifications to trigger Lambda functions.

Amazon Athena:

  • Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage.
  • Used for querying data in the data lake, including identifying suspicious transactions and providing data to visualization tools like QuickSight.

Glue:

  • AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to prepare and load data for analytics. It’s serverless and automatically discovers and catalogs metadata about data sources.
  • Used for data cataloging, ETL jobs, creating dashboards, populating data catalogs, and designing NoSQL databases.

QuickSight:

  • Amazon QuickSight is a fully managed business intelligence service that makes it easy to deliver insights to everyone in your organization.
  • Used for visualizing data and publishing dashboards.

Redshift:

  • Amazon Redshift is a fully managed data warehouse service in the cloud. It allows you to run complex queries on large datasets.
  • Used for cloud data warehousing, querying flattened data, and creating materialized views.

Kinesis:

  • Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information.
  • Used for streaming ingestion, real-time data processing, and data analytics applications.

DynamoDB:

  • Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability.
  • Used for real-time data processing and federated queries.

Lake Formation:

  • AWS Lake Formation is a service that makes it easy to set up a secure data lake in days. It simplifies and automates many of the complex manual steps involved in creating a data lake.
  • Used for securing the data lake and creating restrictions for IAM users.

Step Functions:

  • AWS Step Functions is a fully managed service that makes it easy to coordinate the components of distributed applications and microservices using visual workflows.
  • Used for event-driven ETL automation.

Cloud9:

  • AWS Cloud9 is a cloud-based integrated development environment (IDE) that lets you write, run, and debug your code with just a browser.
  • Used for designing a NoSQL database.

OpenSearch:

  • Amazon OpenSearch Service (successor to Amazon Elasticsearch Service) is a fully managed service that makes it easy to deploy, secure, and operate Elasticsearch at scale.
  • Used for document indexing and search.

Data Analytics Role — Assignments

In the following section, I’ll outline the 16 new assignments I undertook, accompanied by images representing the proposed solution architecture for each assignment. Additionally, it’s provided step-by-step guidance on solving each problem, detailing the AWS services utilized to achieve the desired outcomes.

It’s worth noting that the episodes are presented in the order in which I completed them. While some assignments may have dependencies on others, the sequence can be customized based on individual preferences and requirements.

Episode — Serverless Foundations

Serverless Foundations Episode — Solution Architecture.

Episode — Data Lakes

Data Lake Episode — Solution Architecture.

Episode — Business intelligence dashboard

Business Intelligence Dashboard Episode — Solution Architecture.

Episode — Populating the data catalog

Populating the data catalog Episode — Solution Architecture.

Episode — Daily batch extraction

Daily batch extraction Episode — Solution Architecture.

Episode — Cloud Data Warehouse

Cloud Data Warehouse Episode — Solution Architecture.

Episode — Streaming Ingestion

Streaming Ingestion Episode — Solution Architecture.

Episode — Real-Time Data Processing

Real-Time Data Processing Episode — Solution Architecture.

Episode —Data Ingestion Methods

Data Ingestion Methods Episode — Solution Architecture.

Episode — Securing the Data Lake

Securing the Data Lake Episode — Solution Architecture.

Episode — Event-Driven Serverless ETL

Event-Driven Serverless ETL Episode — Solution Architecture.

Episode — Document Index and Search

Document Index and Search Episode — Solution Architecture.

Episode — Federated Queries

Federated Queries Episode — Solution Architecture.

Episode —Event-Driven ETL automation

Event-Driven ETL automation Episode — Solution Architecture.

Episode — Design a NoSQL Database

Design a NoSQL Database Episode — Solution Architecture.

--

--

Filipe Pacheco

Senior Data Scientist | AI, ML & LLM Developer | MLOps | Databricks & AWS Practitioner