5 steps to AWS Machine Learning Specialty Certification, Made Easy

Published in

xplore.ai

8 min readApr 30, 2020

How to prepare for the exam in 5 steps and what not to do.

The AWS Machine Learning certification differs a bit from all the other certifications; you need to have a good grasp on all things Amazon Web Services, but also understand the basics of Machine Learning and Deep Learning. This may not be an easy task depending on your background.

Many different profiles try this exam, from engineers to cloud experts, but also data scientists and analysts. Of course, the ideal in terms of having a good chance of passing the exam would be to be an experienced jack-of-all-trades that can feel confortable in all those topics. This is rarely the case, as people usually have strong knowledge of their field, but weaker knowledge when it comes to things they haven’t studied or worked with. We all know unicorns don’t exist.

To help with this situation, this article will show you:

Even without experience in all the topics, you can catch up.
Resources exist, but you’ll also need to play around with AWS services on your own.
Nothing beats first hand experience when it comes to understand AWS.
Time management in the exam is extremely important.
Total ignorance or insufficient knowledge about Machine Learning and Deep Learning is a ticket to failure.

Strategy

Summarized, here are the 5 points that helped me pass. We’ll go through them below:

1- Read the whitepapers, especially Sagemaker’s, but ideally all those from the main AWS services.

2- Try as many AWS services as you can if you haven’t worked with them in the past. Play with them, feel comfortable.

3- Forget AWS official content outline percentages; keep in mind it’s 50% ML/DL, 25% Sagemaker and 25% other AWS services.

4- Don’t get stuck and/or think about how much time you have left; 180 minutes is enough, but use that time wisely.

5- Don’t be fooled by the exam’s constant tricks that try to confuse you. Once you read a difficult question properly, you’ll notice it wasn’t that hard.

1- Read the Whitepapers

Although you can find online resources and paid courses, the best way to understand AWS is to read the documentation of the services. Yes, it is a lot. But as you go through them, you’ll be able to connect the dots and acquire a general and much needed understanding. Amazon’s free training courses can help you too, but some of them are just simple tutorials that won’t give you a deep knowledge of the subject (they’re usually internal presentations and formations made public). The most important service is Sagemaker, and that’s where your effort should be. Pivot from there.

2- Go play with AWS

If you don’t have it already, open an AWS account and start using its Machine Learning services. The Free Tier offers hours to compute, and lets you use more than 60 services for free for 12 months. There are great demos in there, and free of charge tools that will help you understand usage and case scenarios and to try pre-trained algorithms by yourself. This is priceless.

3- Content Outline Percentages

Amazon provides a list of domains you’ll need to know, from Data Engineering and ML Implementations and Operations (each representing 20% of the exam), Exploratory Data Analysis (24%) and Modelling (36%). My view after passing the exam was that they didn’t make much sense. A more real partition would be this:

Machine Learning and Deep Learning (50% of the exam)

You should be comfortable with the whole ML cycle; from data collection and data preparation to exploratory data analysis and modelling. A good grasp on how to formulate problems and measure success is paramount. Which metrics to use play an important role in the exam. What types of data (structured, unstructured) exist and what makes good data in a machine learning project.

Data preparation is a key step in data science, and you should expect a few questions on missing value handling, categorical encoding, imputation and other feature engineering steps. Pay attention to scaling (normalization and standardization) for numerical data, n-grams and bag-of-words for text data, as well as Term Frequency — Inverse Document Frequency (tf-idf), as you may have to solve simple corpus data problems. Also, keep in mind how and why recordIO protobuf format will work better in AWS jobs.

Some questions may present you visualizations of data and ask you something about that data. This is clearly an attempt to test your skills in data analysis. You should be comfortable knowing the different types of graphs and their aim (comparison, composition, relationships and distributions). Be aware of how they are used and how they can help in exploratory data analysis.

Model design is obviously present in the exam too. How to select a good model, what ML approach is better in a situation (regressions, classifications) and what metrics and strategies should be used. About the algorithms, you should be comfortable with all the classics (K-Means and its differences with K-Nearest Neighbours, Random Forests and Decision Trees) as well as with Convolutional Neural Networks.

You’ll need to understand the concepts of training, test and validation data, identify potential biases introduced by insufficient splits, and additional measures that could be used to increase data value. What kind of generalization are we looking for in a ML process, how it will be consumed (real time, batch, API applications), and how to tell if the generalization is working (accuracy tests) is an all-present topic throughout the exam. You’ll need to be able to explain how Confusion Matrices, Recall, Precision and False Positive Rate work.

About evaluating models, you’ll need to understand offline and online validation, and at least conceptually know about canary deployments. Underfitting and overfitting and how to overcome them, regression accuracy (RMSE), histograms and skewness in them, AUC metrics in classification, and all the trade-offs in evaluation that might call for different optimizations are all very important in the exam. Also, model tuning specifics and at least some grasp in Bayesian optimization will help a lot.

Sagemaker (25% of the exam)

While knowledge of Machine Learning is expected from you in the exam, the star of the game is Sagemaker, AWS’ central and pivotal service for all things AI. Sagemaker helps you through the whole ML cycle from start to finish, and you’ll probably have to explain how AWS services interact with it. Keep in mind that it will appear in questions quite often, and you need to know its intricacies when it comes to training models (how to create a training job API, how to set up an Elastic Container Repository, inferences).

Another important thing to know are Sagemaker built-in algorithms. From Linear Learners to Factorization Machines, Image Analysis and Anomaly Detection, and differences amongst text analysis algorithms (managing and explaining when to use LDA, Neural Topic Modeling, Seq2Seq and Blazing Text can and probably will appear often). As a widely used algorithm both in regression and classification problems, XGBoost shows up too, as in comparisons with other algorithms. Keep in mind that knowing the main built-in algorithms hyperparameters and metrics is a must.

Last but not least, knowing how to take Sagemaker models to production and how Jupyter notebooks work is needed. Sagemaker hosting services (how to create endpoint configurations) and inference pipelines to chain together algorithms, docker containers and elastic inference can pop up in the questions. You should also know how automatic scaling works.

Other AWS services (25% of the exam)

Here’s where the exam gets similar to other AWS certifications. If you already have a solutions architect or big data certification (I didn’t), all this will surely be familiar. The most important services here are AWS Glue and Athena (for ETL jobs), the Kinesis family (for streaming data), S3, RDS, DynamoDB and Redshift (as data stores) and the Hadooq cluster ecosystem Elastic Map Reduce (EMR). You should understand in depth how they all work, what are their strengths and why should be used in specific situations. The white papers for these services are quite good, but ideally you should have some experience working with them. If that’s your case, you are covered.

Some questions may make you choose the correct order of services to use in a given business situation. You will need to know each service, its purpose and use-cases that will be presented to you in the questions. Keep in mind all monitoring and evaluation tools for deployments AWS has to offer. I’d recommend making an effort in understanding CloudWatch and CloudTrail in detail if you’ve never used them before.

There’s a few AI developer services that you must know, but the good news is that they’re quite simple: Forecast, Lex, Personalize, Polly, Rekognition, Transcribe and Translate are all easy-to-use services available in the console that you just need to try yourself to understand.

For deployments outside Sagemaker, AWS offers some things you should check out: Elastic Container Service, EC2 AMIs, Elastic Map Reduce and on-premises options (MXNet and TensorFlow frameworks).

4- Time

While you’ll have a bit less than three minutes per question (which is more than enough), I find the best strategy is to answer first all those questions you know, -or have little doubt about- and leave the harder ones for later.

This means going through the questions once and mark the ones that make you doubt. Then, dedicate the bulk of your time to hard questions. If you do this right you may even have the last 20–30 minutes of the exam to make a last check and get an intuition of how you just did.

5- Exam Logic

You’ll notice some verbose questions in the exam, but after reading them, you’ll see they are just trying to confuse you. Read the answers carefully. Even before the questions themselves. They usually point to the service or solution better than the question.

This is an important point: the exam will try to trick you, almost constantly. It’ll try to find where your knowledge is thinner with very similar answers that may superficially look the same. If a question seems complicated, leave it for later. The worst thing you can do is to get stuck in a question and spend 10 minutes there, because you’ll probably need them at the end of the test.

There’s also a chance that you’ll need to do some calculations. Ask for paper and a pen, or whatever option is allowed, in your examination center. This may seem silly, but having the opportunity to write down doubts and scribble simple math could be a game changer in the exam, and who doesn’t want all the odds in her/his favour?

Best of luck!

Original article here.