Study Guide for Microsoft Azure DP-100: Data Scientist Associate Certification

Shivam Sharma
Applied Deep Learning
5 min readApr 3, 2019

Update 22/4/2020:

  • Total number of questions are 51. Including 10 questions that can not be skipped.
  • Also, It’s a tough certification and questions change. Kindly prepare well before attempting.

Updated 17/5/2019:

  • Certification is out of beta
  • Now the total number of questions are 35 in 180 minutes
  • Only one case study will come on regression analysis with Azure ml studio
  • Azure ML studio is one topic from which the majority of questions are being asked. Around 50% questions from ML studio, 40% from ML service & 10% are generic data science questions.
  • Overall certification difficulty is drastically reduced compared to what I have described below.
  • Focus on data transformation processes in Azure ML studio.

Microsoft recently released a certification named DP-100: Designing and Implementing a Data Science Solution on Azure for the title of Microsoft Certified: Azure Data Scientist Associate. This certification is for a data scientist to check his knowledge and solutioning skills on Azure. This certification checks for both the width and depth of data science knowledge. Despite being the first one in the series this certification is a tough one to crack, buts that's where I can help.

Read about the certification on Artificial Intelligence here: Study Guide for Microsoft Azure AI-100: Designing and Implementing an Azure AI Solution (Beta)

Certification Pattern

Total time 220 minutes, with actually 180 minutes for 60 questions.

My Experience

Below pointers are based totally on my certification experience.

  • Certification examination started with two case studies that can not be skipped, followed by 46 questions.
  • One question on weka came that I felt to be out of place.
  • Case studies needed a lot of cognitive effort to understand & solve. Advice: Give the certification in a Pearson center as there you get a sketch pad & pen to note/draw stuff. While scheduling I selected the option to give it from home/office. So, found it much difficult to absorb the long case studies without noting anything down.
  • DP-100 is equivalent in difficulty if not more than other certifications like 70–776, 70–773 & AZ-301. It is worth having this certification.

Type of Questions

Below are the type of questions:

  • Single choice based on the scenario
  • Multiple choice questions
  • Arrange in right sequence type questions
  • Complete the code fill in the blanks
  • Case studies with multiple questions.
  • I got questions on scikit-learn, RNN, DNN, ANN, statistics, brainscript, normalization, sentiment analysis, language translation, model optimization, weka, docker, Azure machine learning studio, Azure machine learning service, devops in machine learning, bandit problem, early stop strategies, evaluation metrics, cross-validation, smote, sequence model, hyperparameter tuning, automated machine learning in Azure, tensorflow, keras, data types (do read this to get some questions right), padas data frame melt(), spark, azure data bricks, notebook, zeppelin notebook on spark, feature selection, sampling, algorithm selection (boosted decision tree, regression etc)and others.

Only 30–40 % questions were specific to Machine learning on Azure

Study Guide:

Here is a comprehensive list of study material covering DP-100 scope & questions, you can thank me later.

Tip: You need to deeply understand why, when and how to use a technique. * Select development environment (multiple questions)
* Assess the deployment environment constraints
* Analyze and recommend tools that meet system requirements
* Select the development environment
* Set up development environment
* Create an Azure data science environment
* Configure data science work environments
* Define and prepare the development environment
https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/machine-learning/service/how-to-configure-environment.mdhttps://docs.docker.com/toolbox/toolbox_install_windows/ (imp)https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/overview (imp)https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/deep-learning-dsvm-overview* Quantify the business problem
Note: General understanding needed.
*define technical success metrics (expect multiple reasoning based questions)
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/evaluate-model(5-6 questions, understand why/when) imp
*quantify risks
General understanding needed
*Prepare data for modeling
*Transform data into usable datasets
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/convert-to-arff (imp)
why/when/how to use below modules (imp)
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/edit-metadata
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/split-data https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/partition-and-sample (understand When, why & how in different problem statements) https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/tune-model-hyperparameters (multiple questions) https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/evaluate-model https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/score-modelhttps://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/smote (understand techniques to handle unbalanced datasets) https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/feature-selection-modules (v-imp) https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/filter-based-feature-selection (multiple questions)https://www.geeksforgeeks.org/python-pandas-melt/https://medium.com/deep-ai/all-you-need-to-know-about-data-for-machine-learning-a80bc8555d58https://www.analyticsvidhya.com/blog/2017/03/imbalanced-classification-problem/https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/sweep-clusteringhttps://github.com/dotnet-architecture/eShopOnContainersAI/wiki/05.-Setting-up-Product-Recommendations-based-on-Azure-ML-Studiohttps://blogs.msdn.microsoft.com/uk_faculty_connection/2016/05/05/recommendation-engines-in-azure-machine-learning/https://docs.microsoft.com/en-us/azure/machine-learning/studio/algorithm-parameters-optimize* Automated machine learning
https://docs.microsoft.com/en-in/azure/machine-learning/service/tutorial-auto-train-models
* Cross validation
https://datascience.stackexchange.com/questions/28158/how-to-calculate-the-fold-number-k-fold-in-cross-validation
https://machinelearningmastery.com/k-fold-cross-validation/ (learn how to calculate k-folds)https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/cross-validate-modelhttps://scikit-learn.org/stable/tutorial/basic/tutorial.html (code based questions will come)https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/normalize-data (types & when to use which)https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/permutation-feature-importance (imp)https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/fisher-linear-discriminant-analysishttps://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/data-transformation-sample-and-splithttps://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/principal-component-analysishttps://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/data-transformation-learning-with-countshttps://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.truncationselectionpolicy?view=azure-ml-py (imp)imp
https://stats.stackexchange.com/questions/262794/why-does-a-decision-tree-have-low-bias-high-variance
https://towardsdatascience.com/random-forests-and-the-bias-variance-tradeoff-3b77fee339b4
https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping/https://visualstudiomagazine.com/articles/2015/05/01/train-validate-test-stopping.aspxhttps://social.technet.microsoft.com/wiki/contents/articles/39979.azure-ml-studio-creating-neural-networks.aspxhttps://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/group-data-into-bins (imp)* Brain script
https://docs.microsoft.com/en-us/cognitive-toolkit/brainscript-basic-concepts
https://docs.microsoft.com/en-us/azure/machine-learning/studio/text-analytics-module-tutorialhttps://medium.com/@ageitgey/machine-learning-is-fun-part-5-language-translation-with-deep-learning-and-the-magic-of-sequences-2ace0acca0aahttps://docs.microsoft.com/en-us/azure/machine-learning/studio/algorithm-choiceUpdated 12–14–2019Publish a Machine Learning Experiment with Microsoft Azure Machine Learning Studiohttps://docs.microsoft.com/en-us/learn/paths/publish-experiment-with-ml-studio/

If you need further help or have a question then write in the comments below or find me on LinkedIn. Also, do let me know about any changes in questions as the examination is still in beta, so questions or pattern might change. Thanks.

If you have any comment or question, then do write them below.

To see a similar post, follow me on Medium & LinkedIn.

If you enjoyed then Clap it! Share it!! Thanks!!!

--

--

Shivam Sharma
Applied Deep Learning

MCT | MCSE: Azure | MCSA: Machine Learning | Blockchain| R, Architect/Consultant/Trainer. I love working with cutting-edge technologies.