Five Interview Questions to Predict a Good Data Scientist
#ODSC - The Data Science Community
1.2K4

I’m sorry, but if someone asked me these questions, I would look askance at them.

What is the significance of the normal distribution to data science?” I think it is more important to understand that different distributions appear in the world and use distribution-independent tools, rather than making the dangerous assumption that they’re Normal. The failure of Black-Scholes is a perfect example of this. Besides, using linear regression on anything other than very small feature spaces is incredibly abusive.

“Tell me about your passion for data science.” I would find this question extremely manipulative and invasive, similar to “How has Jesus affected your life?”, or “How are you a racist?” I like math, I can code, I am honest about results, I am reliable in getting them to you, and I make a lot of money for my employers. How I feel about it is none of your business. (And yes, by the way, I have written a book).

If the powers that be asked you to change one of your data sources, and thus use different predictors, how would you alter your solution” The major problem with Data Science is that “the powers that be” often don’t want to be told the truth, and will pressure Data Scientists to come up with answers they want to hear. This question makes the disturbing assumption that a Data Scientist should do this. When Data Scientists bend to political pressure, they get themselves out of trouble in the short term, but ruin their credibility in the long term, even among the people who asked them to fudge answers in the first place. It’s the old idea of never trusting a traitor, even one who comes to your side. Without unflinching loyalty to the truth, a Data Scientist is worthless.

“Research has stated that 2.3 billion people have been affected by floods in the last two decades. Describe how you’d approach a data science project to predict upcoming floods in the next 100–500 years.” Weather, along with financial data, is the ultimate example of data dominated by non-linearities. An average Data Science project lasts no more than a month. Weather prediction has been an unsolved problem for decades.