Sales Pipeline Management with Machine Learning: A Lightweight Two-Layer Ensemble Classifier Framework

Haipeng Gao
The PayPal Technology Blog
5 min readJan 4, 2022
Photo by https://unsplash.com/@darya_jumelya

PayPal leverages big data to empower business decisions and deliver data-driven solutions to better serve our customers’ financial needs and drive business growth. In this article, we are going to introduce a lightweight two-layer ensemble classifier framework that can be used to make predictions progressively.

Introduction

For many complex business problems such as sales pipeline management, solving them by using classic classification or regression technique may be inadequate.

For example, opportunities in a sales pipeline usually exist for days before getting closed (i.e., winning the opportunity) or losing it. Every day, sales representatives need to create a list of opportunities in the pipeline to work on. The sales team have limited reach-out counts and resources. So, how one prioritizes opportunities and creates a work-on list on plays a crucial role in efficient sales planning. One approach to solve this problem is by training a binary classifier to generate a propensity score (i.e., the likelihood of winning the opportunity) for each opportunity.

New information about the opportunities in the sales pipeline are added or get updated daily, or sometimes more frequently. Therefore, reach-out decisions need to be made on a daily basis as well. The new information about a given opportunity includes the number of days in the sales pipeline, the stage in the pipeline, the number of (unsuccessful) contacts, and other types of relevant information. In order to generate predictions progressively, our team wanted to develop a data solution that reflects the latest added/updated information.

Here’s one simple example to better illustrate why progressive prediction is very important in decision making. Suppose we have two different opportunities in the pipeline, namely A and B. In this example, we are using a well-trained classic binary classifier to predict the propensity scores for opportunities. A and B are 90% and 60% respectively, and the scores were generated when they initially appeared in the pipeline. However, let’s imagine, as of today, opportunity A has existed in the sales pipeline for 2 weeks (but not closed yet), while opportunity B has existed for only 2 days. One question to ask is, should we prioritize opportunity A over B, because of the higher propensity score, or should we de-prioritize A since A has been in the pipeline for too long and is harder to close.

At this point, it is very difficult to give a definite answer. This is because using only a (static) propensity score over the entire lifespan in the sales pipeline is inadequate. Moreover, the classic solution is not able to consider dynamic factors, (i.e., factors that can change day-to-day) such as the duration in the pipeline. This motivates us to find a better solution.

Two-Layer Ensemble Classifier

In this article, we describe a two-layer ensemble classifier framework our team used to tackle the problem in a progressive way. The first layer is for identifying whether an opportunity will be closed or lost, and the second layer is for identifying whether an opportunity can be closed on the model run day, by incorporating the latest information.

architecture

To be more specific, the first layer is a binary classifier that is triggered to run (once) when an opportunity first appears in the sales pipeline (i.e., when opportunities were created), and it consumes (static) attributes which are available at the time of creation. The first layer then outputs an initial propensity score. Many binary classification algorithms can be used here, and we chose the Gradient Boosting Machine (GBM) as the implementation for the first layer.

The output score from the first layer is then consumed by the second layer, along with numbers of other dynamic features such as duration, which change or get updated over time.

sample trajectory of predictions

The output of the second layer can be interpreted as the propensity score on the run day — assuming this model is scheduled to run on a daily basis. For the second classifier, we intentionally chose logistic regression which helps us understand the feature contributions both qualitatively and quantitatively. For example, features with negative coefficients represent negative contributions, i.e., a longer duration signals a lower propensity score. Features whose coefficients are bigger in magnitude indicate a stronger impact on the final output.

feature importance

Evaluation

This two-layer ensemble classifier framework has been back-tested against the classic binary classification approach (blue line) on the test set. From the comparison chart below, as the duration increases (and knowing more about the opportunities), the approach has a uniformly better performance in terms of the cumulative gains curves.

Structure-wise, this is a very efficient solution. First, it separates dynamic features apart from the static features, and then trains a static binary classifier using features that do not change over time in the pipeline. This can also be viewed as a dimension reduction which ultimately saves a significant number of computations by using a single number (the output of the first layer binary classifier) in place of all static features.

cumulative gains curve

Conclusion

In this article, we introduced a lightweight two-layer ensemble classifier framework as a solution for progressive prediction problems. The solution showed how this framework can be used to progressively generate propensity scores in the sales pipeline on a daily basis. The framework can also be generalized to other problems involving progressive prediction.

Please feel free to share your thoughts in the comments below.

--

--