How I Managed To Pass The AWS Data Analytics Specialty Exam On The First Attempt Despite Having No Hands-On Experience

Affan Mehmood
DiveDeepAI
Published in
7 min readSep 2, 2022

Greetings everyone. I’m going to explain how I managed to pass the AWS Certified Data Analytics — Specialty Exam in 50 days without any prior experience in Amazon Web Services.

Yes, this was my first AWS exam. I did not take the AWS cloud practitioner or any AWS associate exams before taking this AWS speciality exam. You don’t need to meet any requirements to take the specialist exam; anyone can do so. Even if you have never used AWS technology before, you are still eligible to take the exam.

Although AWS recommends at least 5 years of experience with data analytics technologies and at least 2 years of hands-on experience working with AWS, this is not required. As long as you have studied the services in enough depth, you can just go for it. I studied computer science for my degree and have never used AWS before and this is my first exposure to its technologies. I passed this exam on August 11, 2022, and achieved a score of 815, which was more than the required 750. I’ll review the resources I used to study for this exam and the suggestions you should know before taking it.

This exam is going to test your in-depth knowledge of various AWS services and how these services interact with each other, which one is compatible with each other, and their security use cases, etc.

The main services that this speciality exam focuses on include:

  1. The Kinesis Family
  2. AWS Glue
  3. Amazon Athena
  4. AWS Redshift
  5. Amazon EMR

There are a total of 65 questions in the exam, consisting of multiple choice and multiple response questions, from which 50 are marked. AWS uses these 15 unmarked questions to judge the difficulty of these questions. So after gathering enough data these questions will be included in the next exams as marked questions. You can watch this video to get a comprehensive idea of scaled scoring. Anyway, you need a score of 750 out of 1000 to pass the speciality exam.

The materials that I went through in preparation for the exam:

  1. A Cloud Guru:

This course is really good for getting started with AWS. The instructors explain the services and their purpose clearly and elegantly. This course will give you a good overview and a somewhat deep understanding of almost all the services in the exam. This course also includes a mock test at the end to get you familiar with the exam questions.

2. AWS Exam Readiness

This course has all the material needed to pass this exam, especially all the links given at the end of each section. This course took most of my time preparing for the exam because each link had an entire page with multiple concepts. I know it can a lot to digest but give it time and go through it a couple of times at least so you don’t miss or forget any important details.

3. ExamTopics dump

This dump helps you understand the nature of the questions asked, how to approach the question, and choose the best available answer. Most of the default answers are incorrect and you have to read the discussions to pick the correct answers.

4. AWS whitepapers

Big Data Analytics Option on AWS

Amazon EMR Migration Guide

Streaming Data Solutions on AWS

5. FAQs

There are FAQs about every AWS service that give you answers to your basic questions about those services.

I’ve seen people use udemy courses for the preparation of the exam but I didn't use any of those courses and I believe you don’t need to spend money on such courses to pass this exam.

Now I’m going to explain some services in the light of the actual exam. I’m not going to discuss the actual questions because this blog is intended to guide you in the overall preparation process.

Amazon Kinesis

The Kinesis Family has 4 services: Video Streams, Data Streams, Firehose, and Data Analytics. Although Video Streams never come on the exam, it’s still nice to know about. I didn’t receive a question regarding the video streams.

The majority of the collecting part for Kinesis Data Streams and Kinesis Firehose will revolve around this; you must know how to distinguish Kinesis Firehose and Data Streams. Many of the answers employ Kinesis Data Streams and Firehose. You must know KCL, KPL, Kinesis agent, etc, and their properties, when and where to use Kinesis Data Analytics.

There are lots of troubleshooting problems with different exceptions that you need to know. For example ProvisionedThroughPutExceeded, they will ask you how to fix or what causes this exception, etc.

AWS Glue

Many of the questions also include AWS Glue. Glue is a serverless ETL service. In general, seek a Glue response if the issue asks for a cost-effective solution that has minimal operating overhead.

There are also questions about using AWS Glue Data Catalog as the metastore for Hive, Glue Jobs, using Databrew, Glue Studio, Bookmarks, and Glue Schema Registry. They can also ask you to choose between using AWS Glue built-in transformations and other services like EMR or lambda.

Amazon Athena

Athena also pops up in the exam a lot. Athena is a serverless service that provides an SQL interface for querying your data from S3 or other databases that support JDBC or ODBC connections. Athena is also very nicely integrated with Amazon Quicksight. I also got some security questions regarding the connection between S3, Athena, and Quicksight.

You also need to know when to use Athena and S3 select based on a scenario. Also, you need to know how to optimize the query performance of the services. This can be achieved by compressing the data using a columnar format like Apache Parquet or ORC and partitioning your data. There’s a lot of material to cover in this service so you need to give it time.

AWS Redshift

Redshift, The AWS Data Warehousing Solution, is an important part of the Data Analytics Lifecycle so you need to study it in depth. This service can have an exam of its own because of the massive study material available on this. There's a lot to study, from the copy command to the system tables.

You need to know about the types of data distribution styles (KEY, EVEN, ALL), when to choose which style, how to improve performance and decrease costs, how to use the WLM queue, etc. You also need to know when and how to use AWS Redshift Spectrum to minimize storage costs. You also need to know that Athena is for Ad-hoc analytics and Redshift is for complex joins and queries. Also, Athena is slower than Redshift.

Amazon EMR

Amazon Elastic Map Reduce is a massively parallel processing service that is used for processing large amounts of data. You can use a number of applications on EMR that fit your need. From ETL to Machine learning, there are a number of applications available to get the job done.

The difference between using AWS Glue and EMR is the level of customization. While Glue is serverless and uses Apache Spark behind the scenes, with few custom configurations, If you need a higher level of customization or need to use other applications, you need EMR.

There are also quite a few questions about the different types of encryption available for EMR. You need to study all the encryption types that include:

  1. SSE-S3
  2. SSE-KMS or CSE-KMS
  3. CSE-Custom
  4. Local disk encryption.
  5. Open source HDFS encryption:
  6. EBS Encryption (can encrypt root device volume as well) and LUKs
  7. In-transit encryption: PEM and Custom

AWS Services Compatibilities

Below are some of the highlights that might help you in the exam.

  1. Quick Sight is compatible with CloudTrail (CloudWatch is not compatible)
  2. EMR spark streaming can take data from Kinesis Data Streams.
  3. Redshift can get data using the COPY command from EMR, DynamoDB, and S3.
  4. Glue Crawlers can get data from S3, DynamoDB, RDS, Aurora, Redshift, and Publicly accessible databases.
  5. QuickSight data sources include Amazon RDS, Amazon Aurora, Amazon Redshift (Same AZ), Amazon Athena, and Amazon S3. you can also upload spreadsheets, and flat files or connect to an on-premises DB.
  6. SNS has a publish/subscribe system, SQS has to be polled.
  7. EMR can’t subscribe to SNS but can be a publisher (also SNS can filter notifications).
  8. Glue Catalog can be used by Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum
  9. Firehose can have these destinations: S3, Redshift (COPY command), OpenSearch, MongoDB cloud (S3 buffer), and Splunk,
  10. Kinesis Data Analytics supports MSK and Flink.
  11. Firehose can’t store on DynamoDB

Other AWS Services on the exam

  • Lake Formation
  • DynamoDB
  • Database Migration Service
  • RDS
  • Aurora
  • Lambda
  • Step Functions
  • SNS
  • SQS
  • MSK
  • CloudWatch Log
  • CloudTrail
  • Macie
  • Direct Connect
  • Snowball Family
  • IAM
  • KMS
  • Secret Manager

I’m hoping the knowledge I’ve shared with you will help you in your study for the AWS Data Analytics Specialty Exam. I hope I didn’t leave out any crucial points here. I passed the exam, but I’m obviously still not an expert. There are a lot of things for me to learn. I wish you luck with your test! Thank you.

--

--