Scoring sales leads

What is lead scoring, and why does it matter?

Finding your target market is an essential part of any business. One of the most common types of marketing problems is having too many potential customers (aka leads), and not knowing which ones to go after. In those scenarios, it’s inevitable that your product is just going to be a better fit for some leads than for others. That fact can and should be used to your advantage.

Say I run a small coffee roasting business in San Francisco, and I’m looking to expand my sales to other parts of the bay area. To lure businesses, I start offering steep discounts for a limited time. Lo and behold, I get emails and phone calls from dozens of coffeeshops. But I know that closing a deal is a long and costly process — owners want to meet to taste test, negotiate pricing, etc. And I expect that most of those shops will decide to pull out for various reasons, especially once the discount expires. If I blindly pursue the first 15 shops I hear from, I can expect substantial direct losses and opportunity costs from those “bad” leads. Wouldn’t it be great if I knew ahead of time which shops were going to stick around?

This is the idea behind lead scoring. Weed out the bad leads, and target the most likely buyers (the “hot” leads). It’s a simple way to improve efficiency and growth. You can think of it as a 3-step funnel — lead identification, then scoring, then finally, conversion.

The goal: to build an optimized and automated lead scoring platform

I consulted with a new startup called Scribe Technologies. My goal was to help build out a lead scoring platform for one of their client companies, whose identity was masked. What I do know is that leads are identified via inbound email inquiries. Lead companies email the client company about its services, and the client wants to know which leads to follow up with and which leads to pass over.

So how would this happen? Basically, my idea was to look at features of past leads and compare the leads that converted against those that didn’t. Once I have a model for what worked in the past, I can make predictions about which new leads should be targeted.

Getting and cleaning the data

First, I needed to get data on the lead companies. One approach would have been to get the client to send them surveys. However, with only two weeks to work on this project, there was little chance of getting this back in time. Plus, the answers would need to be cross-checked somehow for accuracy and consistency. The other approach was to try to immediately obtain the data in a standardized way through a third-party service. Luckily, there are business intelligence APIs developed for exactly these types of needs.

These APIs work by taking an email address and scraping the web for information about the person who owns the address and about the company he/she is associated with. So the client gave Scribe the addresses, Scribe gave them to me, I plugged them through the APIs, and voila — a whole bunch of data about each lead (actually, more than a hundred variables):

After a brief glance, it was obvious that many variables were not particularly relevant to the task at hand (e.g. different urls and handles). What’s more, some of the variables that might have mattered were categorical and had too many levels, which is problematic for predictive modeling. As an example, the “company sector” variable had 19 levels, with several having less than 10 samples. I performed some simple feature engineering on these variables to lump related levels together. Finally, there was a large number of missing entries on the variables I didn’t decide to exclude outright. I set a threshold at no more than 50% missing data within a feature column, and within each group of converted or unconverted leads, in order to minimize any systematic artifact in model prediction that might result from missing data. I also eliminated rows of data that had more than 50% missing data across the remaining features, again to try to minimize such artifact.

Variables have been renamed here to maintain confidentiality agreements. *Denotes variables treated as continuous. **Denotes multi-level categorical varibles

The final dataframe included roughly 600 leads with only 7 features (listed to the right), winnowed down from roughly 1400 leads. That’s a large amount of discarded leads. In those cases, the client will need to survey the lead in order to at least get the most important missing pieces of data. So, I’ll want to use a model that lets me interpret which features are most important.

Modeling lead conversion

Finally, time to plug into a model. But what kind of model? For instance, linear or nonlinear? Plotting the 3 continuous features illustrates quite clearly that the likelihood of conversion is nonlinearly related.

So a nonlinear ML algorithm is the way to go. And I know that I want be able to inspect feature importances, which rules out SVMs. This looks like a good case for decision trees.

I used random forests because they require relatively little hyperparameter tuning and they give estimates of feature importances along with some uncertainty around those estimates. I used a 50–50 split of the data for training and testing. Within the training dataset, I performed a grid search with cross-validation to find the optimal number of trees, maximum features per split, minimum samples per split, and minimum samples per leaf. The default values were returned for all hyperparameters except for number of trees, which I set to 200. I then retrained a final model on the training set. Other steps were taken, such as k-nearest neighbors imputation of missing values, and synthetic oversampling of the minority class (in this case, the converted leads) in order to balance the classes. These steps were done independently for training and testing datasets.

The result? Lead conversion was predicted with ~80% accuracy in the hold-out samples. Precision and recall were 80.6% and 80.3%, respectively.

Upper and lower whiskers represent 95th and 5th percentiles, respectively

90% confidence intervals were established by retraining and retesting the model by taking random training/testing splits from the original dataset.

Feature importances are shown below.

Person job, person seniority, and company sector are represented as multiple one-hot encoded variables.

Thus, when too much data is missing from the API call in order to pass a lead through the model, these are the features that the client needs to manually obtain from the lead, in order of priority.

So, which leads should be targeted?

The point of all this is ultimately to make a decision about which leads to target. This means I needed to set a threshold on the predicted probability of conversion. Any threshold I choose will represent a trade-off between 1) the certainty that leads above this threshold will convert, and 2) number of leads above this threshold. This tradeoff is represented in the graph below by the red and black lines (precision and % of leads, respectively). Side note: much thanks to this blog post by Slater Stich for making available code that was very helpful in generating these plots.

90% confidence intervals were established by retraining and retesting the model by taking random training/testing splits from the original dataset.

Another brief note: while converted and unconverted leads were balanced in the training data in order to maximize unbiased model performance, this graph assumes that the true rate of conversion is 20%, which is a rough guideline from the client company.

So what’s the optimal trade-off? Well, this really is a business decision that should depend on business metrics. The client company could have simply provided a desired precision based on their own internal accounting, and I could have found the corresponding threshold. But business metrics aren’t static. So, I wanted to go one step further and provide a risk model that dynamically optimizes the threshold as business data gets updated.

In the simplest sense, the metrics that matter are 1) the number of incoming leads, 2) the sales team capacity to target leads, 3) the expected revenue from converting a lead, and 4) the expected cost of targeting a lead. For instance, the greater the ratio of capacity vs incoming leads, the more lenient the threshold should be. Once there is no bottleneck in capacity, the threshold also becomes more lenient as the expected revenue-to-cost ratio increases.

I used the following values:

The # incoming leads and sales team capacity represent approximate estimates from the client company, whereas expected revenues and cost are made-up values

I plugged these numbers into a very basic risk model comprised of the three following steps:

  1. At any given threshold, find the number of targetable leads. This is equal to the lesser of the sales team capacity or the number of leads passing this threshold.
  2. At this same threshold, find the expected profit per lead. This is equal to the precision times the expected revenue per lead, subtracted by the cost per lead.

3. The expected profit is equal to the number of targetable leads times the expected profit per lead — or, the output of step 1 times the output of step 2.

Below are the model’s results, showing expected profit across thresholds.

90% confidence intervals were established by retraining and retesting the model by taking random training/testing splits from the original dataset. The threshold producing maximal profit is shown as a dotted line. The proportion of hot leads corresponds to the median ratio of leads in the testing dataset that surpass this threshold.

As you can see, the optimal threshold occurs at around 0.7. About a quarter of leads surpass this threshold. These are the “hot leads” — the ones that should be targeted. It’s also easy to see that you would expect substantial losses from setting the threshold too high or too low.

Of course, there are all sorts of oversimplifications embedded here. Not all leads will make the same purchase size nor will they all cost the same to target, and it’s virtually impossible to perfectly predict sales team capacity or the number of incoming leads. As the next section illustrates, one thing that is quite nice about constructing this basic risk model is that the threshold adjusts dynamically as business conditions change.

Variable thresholding

Let’s say I’m the client, and I run a marketing campaign that produces a flood of new incoming leads. I continually update the risk model as the lead count grows:

The graphs show that as the total lead count grows, the threshold tightens up, leading to greater profits and a shrinking proportion of hot leads.

From here, growing the sales team capacity will cause the threshold to progressively relax.

And so on, and so forth. In summary, threshold optimization is predicted to generate substantial value for the company.


The exercises above show the power of machine learning combined with some basic business intelligence. I created a platform for a client company to optimize lead scoring through automated modeling, prediction, and thresholding. By taking into account current business conditions, this platform should drive sales efficiency and maximize company profit.