Technical analysis of the Harvard Business Review article “Better Ways to Predict Who’s Going to Quit” published in August 2019 — DeCypher DataLabs
DeCypher DataLabs Introduction: This is a technical analysis of a research experiment designed to leverage Big Data and Algorithms to predict the likeliness of an employee to quit their job.
I. The article “Better Ways to Predict Who’s Going to Quit” published by the Harvard Business Review in August 2019 covers research about a real-time index to predict whether an individual is likely to quit their job. The index is called Turnover Propensity. The index classifies individuals as Unlikely, Less Likely, More Likely, and Most Likely to quit their job.
Review of the Objective: It is best practices in data science to take into consideration the dimension of time while predicting whether an individual is likely to quit their job. Here are two outlier scenarios to explain this concept: what if an individual was hired yesterday, and the model predicted “Most Likely” to quit, and what if an individual has been with a company for ten years, and one month before retirement the model predicted “Unlikely” to quit.
II. The research applies a multi-class classification model.
Review of the Model: The functional output of a classification model is a probability score between zero and one. To translate the numeric probability score into a categorical variable, the data scientist would need to define intervals. These can be uniform; such that probability values between 0–0.25 are labelled as Unlikely, probability values between 0.26–0.5 are labelled as Less Likely, probability values between 0.51–0.75 are labelled as Most Likely, and probability values between 0.76–1.0 are labelled as Most Likely. These probability scores and categorical classes do not include the variable of time; where 0–0.25 cannot be interpreted as unlikely to quit within the next five years nor can 0.76–1.0 be interpreted as likely to quit within the next three months.
III. The model is developed using two sets of variables: Company Indicators and Individual Indicators. The company indicators include analyst ratings, stock price variation, news articles, and regulatory or legal actions against the firm. The individual indicators include number of past jobs, employment anniversary and tenure, skills, education, gender, and geography. The research is based on a Public Big Dataset of 500,000 individuals in the U.S. across various organizations and industries.
Review of the Data and Variables: The variables include company data and individual data. The data is accumulated into one dataset to train the model. It is an interesting experiment to integrate the concepts of macroeconomics, microeconomics, and behavioral economics into one predictive model. Even though the art-of-possible permits merging structured and unstructured datasets to train an algorithm to generate some predictions, this does not always transfer to explainable indicators, a proven hypothesis, and repeatable results in the long-term.
IIIV. Two experiments were conducted: an email recruitment campaign for 2,000 individuals and an observational study of the remaining 498,000 individuals for 3-months. The results of the email recruitment campaign include a confirmation of 1,473 emails received, 161 email opened; and 40 clicked through; where those who were rated as “most likely” to be receptive opened the e-mail invitation at more than twice the rate of those rated as least likely (5.0% versus 2.4%). The results of the observational study of 498,000 individuals show that those classified as “More Likely” were 40% likely to quit, and those classified as “Most Likely” were 63% more likely to change jobs.
Review of the Email Campaign Results: Forty emails were opened and clicked-through from an original volume of 2000 emails sent. This is in sync with industry standards for marketing email benchmarks of click-through rates. Ideally, the research would also compare the results within a control group or the propensity of the same forty individuals to open any other email regarding any other topic.
Review of the Observational Study Results: The accuracy is measured by comparing: the “Most Likely” predictive class, compared to the “More Likely” predictive class, benchmarked against the “Unlikely” predictive class; while the preferable standard measure for accuracy of a classification model is a confusion matrix.
V. The research concludes that firms have an advantage compared to outside researchers to predict the likeliness of an employee to quit their job. This is possible due to better access to data and company information.
Review of the Conclusion: This is an accurate evaluation compared to aggregating data about several companies and extrapolating patterns from one company to another.
DeCypher DataLabs Conclusion: HR models are prone to several types of risks including but not limited to the quality of data, historical bias in data, algorithm selection, and measure of accuracy criteria. To be in compliance with HR policies and regulations, companies implementing HR predictive models would probably need to ensure that their predictions are auditable. Furthermore, all automated-decisions generated by the algorithm need to be reviewed by an HR committee to mediate the risk of incorrect algorithmic predictions.
Originally published at https://www.decypherdatalabs.com on August 23, 2019.