Scientific Integrity vs Team Player Mentality

Michael Elashoff
Project Ronin
Published in
3 min readJan 13, 2020

What to do when your principles are challenged as a data scientist.

A recent survey found that 80% of biostatisticians had been asked to conduct or report scientifically inappropriate analyses in the past 5 years. This included requests to remove or alter data records (25%), and to interpret findings on the basis of expectation, not actual results (35%). This may seem surprising to people who are just starting their data science career. To me, the surprising thing is that the survey numbers weren’t higher. In my experience, the numbers are closer to 100% across the board.

Requests to do inappropriate analyses or inflate your results often come in a somewhat disguised form. “You should remove these outliers” sounds more reasonable than “Delete the data that is making the results look worse.” “We can impute the missing data with the mean” sounds better than “Ignore why the data is missing and hope the results are right.” These are hard because sometimes incorrect data should be removed, and sometimes imputation is the best approach. The “tell” is when in all of the decision points in a project, people suggest you take the path that makes the results look better, not worse.

Data scientists are warned about p-value hacking but this is often presented as a methodology problem and not a people problem. In the abstract, it is easy to do the right thing. But that underestimates just how hard it is if your manager suggests you don’t need to worry about the missing data, or your colleague explains why the data points that don’t fit the desired pattern are errors, or the CEO wants to show model training results to the board instead of model validation results. The survey results bear out the human aspect: these requests are more common to younger, minority data scientists who might feel their job security is at risk if they don’t comply.

When a situation like this does arise, it is important to know that you’re not alone. Requests like these have come up repeatedly in my career, and I’ve found that discussing the details with trusted mentors can be a big help. Because this is such a common occurrence, they will likely have gone through similar situations. If you don’t currently have someone who can play this role, feel free to connect and maybe I can help think things through or introduce you to someone who can. I feel strongly that this is the most important way that experienced data scientists and statisticians can make our field better.

The longer-term solution is to find a manager and a company that won’t put you in this situation. Ask about this topic during the interview process. If you hear stories of experiments that failed, models that didn’t work, and data that was sometimes too messy to draw strong conclusions, you’ll know scientific integrity is practiced there. If you hear a phrase like “team player” too many times, you’ll also have your answer.

--

--