Amazon Data Science Interview

Amazon has gone from becoming the “Earth’s biggest Bookstore” to “Earth’s Most Customer- Centric Company”. The CEO Jeff Bezos has time and again defined the path of the company in his shareholder letters.

Amazon deploys Deep language learning capabilities with Alexa and provides cloud infrastructure for AI via AWS. It also build and deployed some of the world’s first recommendation systems at scale on Amazon.com. Amazon also provides cloud credits for research.

Interview Process

Amazon interview process and experience is described in detail by Aaron Krauss in his blog. Even though the blog is old the fundamental process still remains the same even today. During the onsite interview process, there is a bar raiser interview. The bar raiser means, the most experienced person in the interview panel, whose motive is to decide whether you are in the top 50% of Amazon employees or not for that role. Bar raiser has the power to veto a candidate out, no matter whether other interviewers like the candidate or not.

Important Reading

Sagemaker Courtesy:Amazon
  1. AWS Sagemaker(Video): Build, Train and Deploy ML models at scale.
  2. Deep Learning AMIs: Tutorials to use AMIs on AWS.
  3. Amazon AWS Blog: ML Blog (Great examples on different solutions to data science stack related problems)

AI/Data Science Related Questions

  • How does a logistic regression model know what the coefficients are?
  • Difference between convex and non-convex cost function; what does it mean when a cost function is non-convex?
  • Is random weight assignment better than assigning same weights to the units in the hidden layer?
  • Given a bar plot and imagine you are pouring water from the top, how to qualify how much water can be kept in the bar chart?
  • What is Overfitting?
  • How would the change of prime membership fee would affect the market?
  • Why is gradient checking important?
  • Describe Tree, SVM, Random forest and boosting. Talk about their advantage and disadvantages.
  • How do you weight 9 marbles three times on a balance scale to select the heaviest one?
  • Find the cumulative sum of top 10 most profitable products of the last 6 month for customers in Seattle.
  • Describe the criterion for a particular model selection. Why is dimension reduction important?
  • What are the assumptions for logistic and linear regression?
  • If you can build a perfect (100% accuracy) classification model to predict some customer behaviour, what will be the problem in application?
  • The probability that item an item at location A is 0.6 , and 0.8 at location B. What is the probability that item would be found on Amazon website?
  • Given a ‘csv’ file with ID and Quantity columns, 50million records and size of data as 2 GBs, write a program in any language of your choice to aggregate the QUANTITY column.
  • Implement circular queue using an array.
  • When you have a time series data by monthly, it has large data records, how will you find out significant difference between this month and previous months values?
  • Compare Lasso and Ridge Regression.
  • What’s the difference between MLE and MAP inference?
  • Given a function with inputs — an array with N randomly sorted numbers, and an int K, return output in an array with the K largest numbers.
  • When users are navigating through the Amazon website, they are performing several actions. What is the best way to model if their next action would be a purchase?
  • Estimate the disease probability in one city given the probability is very low national wide. Randomly asked 1000 person in this city, with all negative response(NO disease). What is the probability of disease in this city?
  • Describe SVM.
  • How does K-means work? What kind of distance metric would you choose? What if different features have different dynamic range?
  • What is boosting?
  • How many topic modeling techniques do you know of?
  • Formulate LSI and LDA techniques.
  • What are generative and discriminative algorithms? What are their strengths and weaknesses? Which type of algorithms are usually used and why?”

Reflecting on the Questions:

The questions have lots of elements of coding/computing. There are questions which are practical as well as fundamental in nature which really require the person to have stepped through data models and data sets to be able to get to the solution. Some of the questions above are also from the bar raiser interview. One can certainly refer Part 1 and Part 2 of preparing for these interviews.

Even though Amazon is hiring a lot of Data Science folks it is certainly doing it with a high bar of excellence. The company is going for something a whole lot bigger than e-commerce and is winning in most of the areas. A good deal of focus and hard work can land you a job in the world’s most customer centric company.

Subscribe to our newsletter here. We are building a new course to help people ace data science interviews. Sign up below to join the wait-list!

If you find this article useful, please share your thoughts in the comments below. Please clap on the article to signal me how much you like this article and if you would like me to write more of such articles.

--

--