Preparing for the AWS Data Engineer — Associate Certification Exam

Leticia Massae
8 min readDec 29, 2023

For Starters

I took the DEA-C01 exam in Dec of 2023. This certification is still in Beta, so the result will take 90 days until given(fingers crossed).

The AWS Data Engineer Associate exam is a new AWS Certification. It validates skills and knowledge in core data-related AWS services, ability to implement data pipelines, monitor and troubleshoot issues, and optimize cost and performance in accordance with best practices. So be prepared for these types of questions and answers during the exam.

Since this Exam is in beta it has some differences from the usual associate certifications: Exam duration is 170 minutes, Exam format consists of 85 questions of either multiple choice or multiple responses, Cost is 75 USD(from the usually 150 USD), Test can be done in-person or online,
only language available is English and the window to do the exam is from November 27, 2023 to January 12, 2024.

https://pt.pearsonvue.com/Clients/Amazon-Web-Services.aspx

In this exam, I have never worked with several of the AWS Services, It does not have many courses yet due to being a beta, so I studied some with the AWS Data Analytics Special practice tests. My end-to-end study path took around 1 month(because I have just finished studying and achieving the AWS Solutions Architect — Professional, so I leveraged some of the content for this one), from reading some FAQs, trying out the exam practices, and reviewing the services and features I had to learn.

My study routine was usually at the end of the day(it depends if I had any late meetings or job activities and if I had then tried to study in the morning to compensate) and about 1/2 hours of study per day during the week.
Your daily routine depends only on yourself. If you are a morning person or are more active at night; how your work agenda; gym hours and etc. Only you know what’s the better time or the time you are more comfortable with.

Best Courses and Practices Exams

For the studying I used this Hand-On and the practice test from Udemy. Do not forget the FAQ’s!
And for the Practice exams, I used the Data Analytics — Specialty Practice test from Tutorial Dojo.

DO NOT RUSH THE PROCESS! Studying for the certifications takes time, especially if you are entering the world of cloud now so it’s natural to mature the content over time and get experience during work.

AWS Certification Roadmap

  1. AWS Cloud Practitioner
  2. AWS Solutions Architect — Associate
    The focus of this certification is on the design of cost and performance-optimized solutions, demonstrating a strong understanding of the AWS Well-Architected Framework. This certification can enhance the career profile and earnings of certified individuals and increase your credibility and confidence in stakeholder and customer interactions. by AWS certification documentation
  3. AWS Data Engineer — Associate
    This exam is designed for candidates with 2–3 years of experience in cloud data-related roles or in on-premises data-related roles, moving to the AWS Cloud. Candidates in cloud roles such as data engineer, data analyst, data architect, or business intelligence engineer can earn this certification and gain credibility and confidence.
    Those in adjacent roles like software engineer, cloud engineer, reporting analyst, data quality analyst, and on-premises data roles can also prepare for and earn this certification. by AWS blog

Solutions Architect Roadmap

Here are some of the skills needed to be a Solutions Architect

https://www.facebook.com/photo.php?fbid=601903091951519&id=100063954998986&set=a.577901341018361

Don’t be afraid of all of this. You will get these skills over time while working, facing challenges, clients/customers, people management and partnerships. You can always trust your fellow partners in crime from work for advice to help you out with technologies and services you don’t have expertise in.

Exam Domains

  • Domain 1: Data Ingestion and Transformation (34% of scored content)
  • Domain 2: Data Store Management (26% of scored content)
    Domain 3: Data Operations and Support (22% of scored content)
  • Domain 4: Data Security and Governance (18% of scored content)

Study Paths

There are some paths for you to study for the certifications:

  • Slow: Watch all the course for the certification, do the demos, understand the service understand how it integrate with other AWS services, read the whitepapers and FAQs, do the practice exams, go back to the course or read the papers again, more practice exams and finally go take the certification exam. This is the path that I recommend, you will learn much from it and be a better IT professional with all this knowledge.
  • Fast: Start by doing a practice exam with all domains and by the end it will give you a score and which domain you do not match the expectations. Focus on studying the domain and repeat the process. I only recommend this only if you have a short deadline to take the certification. BE AWARE that this is only to pass the exam, you will NOT learn much from it.

Exam Topics:

Here are some of the AWS Service and its features that were in the exam. Remember that the questions are always about use cases, so you are going to face questions with at least 2 services and more of their features on it.

  1. Athena
    Athena notebooks: can be used to directly query data in S3 and leverage Apache Spark for advanced data transformations and analytics; with its Jupyter notebook integration and Apache Spark provides a robust platform for direct querying of S3 data;
    Bad performance while reading several small objects in S3: Optimize file sizes into larger objects
    Aggregate functions to gain a summarized view of data
    Query examples
  2. AWS Lake Formation
    Uses a Centralized permission model for Granular access to data
    Designed to manage permissions across different AWS analytics services.
    Tag-based access control. Tag sensitive data
    Data sharing feature simplifies and secures the process of sharing data across different AWS accounts or with external organizations.
  3. S3
    Infrequent Access
    S3 Archive
    Events
    Object Lock
    Cross-Account Replication
  4. Macie
    PII data
    Integration with S3
    Integration with AWS Lake Formation: Allows for robust management and governance of access to the PII data
  5. ElastiCache
    Lazy-Loading strategy: Ideal for read-heavy, infrequently updated data scenarios; Since the application’s primary workload involves complex, read-intensive queries, this approach minimizes cache maintenance overhead and ensures only the most requested data is cached.
  6. Amazon QuickSight
    SPICE engine: offers the capability to build interactive dashboards with direct, live connections to various data sources, including Amazon RDS; SPICE’s automatic refresh capability ensures that the dashboards display the most up-to-date information from the RDS PostgreSQL database.
  7. AWS SCT
  8. KMS
    SSE-KMS with customer managed keys
  9. AWS DMS
    AWS SCT (Schema Conversion Tool)
    Schema Copy
  10. AWS CloudTrail
    Logs sent to S3
    Query trail logs
    AWS CloudTrail Lake: Provides an optimized and centralized solution for storing, managing, and analyzing CloudTrail logs; It allows the retention and querying of logs for up to seven years, which aligns well with the company’s need for year-long data analysis
    AWS CloudTrail Lake
  11. AWS CloudWatch
    Logs
    Logs Insights
  12. AWS Glue
    AWS Glue Crawler
    AWS Glue Jobs
    AWS Glue Jobs Bookmark
    AWS Glue Data Brew: Missing data; Inconsistent data; Duplicate data
    AWS Glue Schema Registry: Is crucial as it stores the schemas of your streaming data and manages different versions; This ensures your data format stays consistent, which is essential for preventing issues due to changes in data structure over time; Such consistency is critical to avoid data processing failures or corruption, ensuring the streaming data’s integrity remains intact.
    ETL
    Apache Spark on AWS Glue
    From and to S3
    JDBC/ODBC connections
  13. Amazon Redshift
    Amazon Redshift Advisor: For recommendations on query performance
    provides automated recommendations to optimize the performance of Redshift clusters, such as distribution style changes, sort key additions, and more.
    Amazon Redshift Query Performance
    Amazon Redshift Query Performance Insights: To monitor query performance; Provides a comprehensive view of query performance, allowing data engineers to quickly identify long-running or problematic queries. This helps in understanding the performance characteristics of both individual queries and the overall workload.
    Redshift Serverless: Optimizes data warehouse capacity, charging solely for the compute resources used, and incurs no charges when idle; Data sharing in Redshift allows the seamless sharing of live data between Redshift clusters and Redshift Serverless endpoints without incurring additional costs; Minimize compute costs
    Amazon Redshift Row Level Security (RLS): Control access to rows of data based on user attributes(Such as team roles, Fine graned control within shared tables); Row-Level Security in Amazon Redshift, allows the database administrator to set up security policies to control access to rows in a database table based on user attributes like their roles or team. This makes it an ideal choice for situations where data-sharing needs are intricate and closely tied to user identities or roles.
    Amazon Redshift Data Sharing; enables sharing of live data across Redshift clusters
    Query examples starts with “string”
    .csv import issue because of IGNOREHEADER on COPY command
    VACUUM operation
    COPY command
    Amazon Redshift Spectrum
    Workload Management(WLM) queue in Amazon Redshift
  14. AWS SAM
  15. API Gateway
  16. Data Pipeline
  17. AWS Lambda
    Provisioned concurrency for warm pool instances
    AWS EFS for additional storage for large files processing
  18. AWS Step Functions
  19. AWS Sagemaker
    AWS Sagemaker ML Lineate Tracking
    AWS Sagemaker Data Wrangler: Utilizing its built-in date functions simplifies the process of standardizing date formats, and employing string functions for cleaning categorical fields is an efficient use of the tool’s features, making this the most suitable option.
  20. EMR
    EMR with Apache Spark: Can efficiently process and anonymize large datasets, and Amazon Redshift allows for robust analytics capabilities post-anonymization
  21. CodeCommit
  22. CodeBuild
  23. CodeDeploy
  24. CodePipeline
  25. AWS Neptune
    For graphical structure
  26. AWS Kinesis Data Firehose
    Near Real-Time use cases
  27. AWS Kinesis Data Streams
    Real-Time use cases
  28. AWS Kinesis Data Analytics
  29. Amazon MSK
  30. DynamoDB
    GSI
    TTL
    Streams
  31. RDS
    RDS Read Replica
    RDS Multi-AZ
    Supported Engines
  32. AWS Aurora
    Aurora Read Replica
  33. AWS EKS
    HPA
  34. AWS Lambda
    Integration with Services
    Deployment
    Versions
    Concurrency
  35. Others Technologies
    Apache Spark
    Apache Flink: Advanced Stream processing
    Hive
    Parquet: Better performance than JSON
  36. Use Cases
    Increase performance
    HIPAA and PII information
    Real-time/near-real time
    Cost-effective
    Steless and statefull transactions
    Statistically significant insights while ensuring minimal computation and storage usage

Some Useful Links

AWS Certified Data Engineer — Associate official page
AWS Certified Data Engineer — Associate Exam Guide
Exam Prep: AWS Certified Data Engineer — Associate
Hands-On Course on Udemy
Practice tests on Udemy
AWS Data Analytics — Specialty Practice Tests on Tutorial Dojo

Considerations

The certification exam questions normally involved 2 to 4 services and their integrations and features, there were real-life scenarios, trick questions, and so on. Always try it on the practice exams because they really help to be more prepared for the certifications.

This exam is very extensive, it is a total of 85 questions in 170 minutes(+30 minutes in accommodations in case English is not your native language) + 5/10 minutes for the surveys, so try to do it in the morning while you are well-rested.

Feel free to comment in case you got anything different from your certification exam.

And finally, Good luck with your next AWS certification, and hope this preview and documentation can come in handy!

--

--

Leticia Massae

Technology enthusiast working as a DevOps with experience in Security Automations. https://www.linkedin.com/in/leticiamassae/