Deep Learning + Survival Analysis: Our Approach to Multi-Task Frameworks

At Square, we’ve experimented with survival analysis models to predict a variety of outcomes: from Seller churn and product conversions, to loan defaults on Capital. Predicting these key events in a seller’s lifetime allow us to provide a better user experience to our sellers, as well as mitigate our own risk with credit and default losses. We’re excited to share some of our current work in survival analysis models and deep learning.

Survival analysis is a field in statistics that’s used to predict when a particular event of interest will happen. The field emerged from medical research as a way to model a patient’s survival — hence the term “survival analysis”. Traditional survival analysis models such as the Kaplan-Meier (KM) and Cox Proportional Hazard (CoxPH) models both have some very rigid assumptions and limitations — for example, CoxPH models assume that hazard functions are strictly linear combinations of an individual’s features, which might not be true given your feature space.

We propose a new model that is robust against nonlinear dependencies in feature spaces and their underlying datasets: the Neural Multi-Task Logistic Regression (N-MTLR). What differentiates the N-MTLR from traditional Multi-Task Logistic Regression (MTLR) models is the addition of a multi-layer perceptron, which allows for further flexibility without the strict linear assumptions that CoxPH and MTLR models have. We’ve found that the N-MTLR model performs about the same as CoxPH and traditional MTLR models when the data set does not seem to contain any obvious nonlinear dependencies.

Representation of a 2-hidden layer transformation within the N-MTLR

Where the N-MTLR model really shines, though, is in its outperformance against the MTLR and CoxPH models nonlinear dependencies are present in the feature space.

In short, the N-MTLR model provides a great solution to survival modeling when you’re all but guaranteed that your feature space exhibits nonlinear behaviour (e.g. some of your features take on a power law distribution). Nonlinear feature distributions are quite common in seller behaviour, so having the N-MTLR model has allowed us to better model churn and conversion events with our sellers.

Lastly, we’re very happy to open-source this research to the greater data science community as we believe that more research collaboration results in better models, best practices, and ultimately products for our users. A more detailed research paper is linked below — feedback is always welcome!

Like what you read? Give sfotso a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.