Acing the Data Science Interview — Part 1

Published in

Acing AI

5 min readMar 14, 2018

What software engineering did to the 2000’s, AI and Data Science related fields will do to the 2020’s.

Previously, I have been writing articles about AI interview questions for some top companies in this domain. Based on the unanimous feedback on these articles from the readers, it seemed there was a natural follow-up as to how to prepare for these interviews. In order to comprehensively cover this large topic, I divided this into parts.

Software is eating the world and it’s replacing it with data. — Pete Skomoroch

Source: https://medium.com/@anandr42/the-data-science-delusion-7759f4eaac8e

Understanding the AI/ML and data science field is important to understand the basis of this interview. It sits at the intersection of computing skills, math and statistical knowledge and deep expertise in a particular domain. The field is highly interdisciplinary in the sense that there can be people with expertise in two of the three and a partial ability in the third one making it into the field.

I also reviewed and distilled the questions and what I have learned so far in my Udacity Deep Learning and AI courses. I came up with a step by step approach to Ace the AI Interview. There are three main pillars of acing this interview — computing, statistics and data visualization and presentation.

Depth matters more than breath.

As long as we can build a foundation for this field, regardless of what someone’s background is, he can surely get into this field. There are nine steps how to build the foundation towards being an expert. These steps are ordered so that we cover the basics first before going to the more complex areas. I have put the approximate amount of days it would need for preparation. These would be highly variable based on experience and expertise from person to person. Most of the resources shared in the article are free. This article covers steps 1 through 4. Step 5 onward will be covered in the next article coming early next week.

1. Start with Basic Python(Computing — 2 days):

If there is a programming language for AI/ML and Data Science than that would be Python. This might be easy to pick up for people who are in computer science who have been doing coding for a while. It might be challenging for people in the research space or in the business analytics domain. All of data manipulation, consumption and model building requires python. The environment to use python would be Jupyter Notebook. A new alternative to Jupyter Notebook is Google Colaboratory. For better distribution and management of projects I suggest to use Anaconda.

Sources I would use for Basic Python learning:

Introduction to Python: UD-1110
Google’s Python Class: Python Course
Codecademy’s Python: Python (I used this a few years ago)

2. Descriptive and Inferential Statistics(Statistics — 1 day) :

Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, presentation and organization of data.

Descriptive Statistics

Descriptive Statistics is the study of understanding patterns that might emerge from data. It is a way to summarize our data and interpret it in a meaningful way. It includes important attributes of the dataset like mean, mode, median and also the deviation or measuring the spread. These attributes help guide us to know the quality of convergence during model evaluation.

Inferential Statistics

Sometimes it is not feasible to consume the entire model. This is where sampling comes in. Sampling is of great importance in inferential statistics and is the basis of breaking down data into samples for training, validation and test for your AI models. Sampling estimation and testing for hypothesis are two main aspects of inferential statistics.

Sources I would prefer for learning Statistics:

Introduction to Inferential Statistics : UD-201
Youtube video series: Brandon Foltz
Statistics in Python: Statsmodel

3. Using Pandas and other libraries(Computing — 1 day)

Pandas is a python data analysis library. This library has all the ways you can consume data and process it. Based on the Statistics knowledge discussed in step 2, this would help you divide your data into samples. NumPy is a python package that serves as the base in the Python Data Science ecosystem. Scipy is the other Python library which would be required for different data manipulations.

Sources I would prefer for learning Pandas:

Kaggle Data Science Learning : Learning Pandas and other libraries
Learn Pandas: How to learn Pandas
Learn on Python.org: Pandas Basics

4. Understanding Data presentation and Visualization ( Storytelling, Visualization and presentation — 2 days)

Part of being in this field, you will also be presenting your projects, facts, ideas or inferences to the business/product team. Data can provide guidance on what to build next and also on what is working and what is not. But not everyone can understand this just by looking at charts and graphs. The right data needs to be presented with the right facts making it easier and consumable for decision making folks. There were some specific hypothetical questions on certain product features in interview questions. Those questions are to understand what hypothesis the candidate will have and is he/she easy able to articulate that to the team. It is an important and often neglected skill which is extremely important. This skill cannot be taught, it has to be learned.

Some good sources for reference examples:

Stichfix Algorithm Tour: http://algorithms-tour.stitchfix.com/

As mentioned previously, Step 5 onwards will be covered in the next article.

Preparing for an AI Interview — Part 2: Steps to Ace the AI Interview — Part 2

Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!

Subscribe to the Acing AI/Data Science Newsletter. It is FREE! Reducing the entropy in data science. Helping you with…

www.acingdatascienceinterviews.com

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.