Creating a sales lead scoring model

Published in

MediBuddy Product and Technology Blog

5 min readFeb 7, 2019

“We are calling every potential lead in our domain, but the conversion isn’t moving!”. If you are dealing with supplying potential leads to your organization, you might have come across someone saying this. The idea of this post is not to tell you about what lead-scoring or lead-nurturing is, but to share the steps on how to statistically identify potential leads from the pool of existing customers.

At DocsApp, initially when the number of customers was low, having a scoring model didn’t help. The model came into action when the user base grew and the allocation of resources to a certain section became necessary. I’ll walk you through the challenges we faced and the steps we took to create an accurate scoring model.

The main challenge of lead scoring lies in determining the right score values to assign to leads based on their behavior or demographics. How do you create a scorecard for a user not based on gut feeling and assumptions and rather actual data to support every assigned point?

What’s the most challenging aspect of designing a score card ?
Building a scoring criteria based on past data that evolves as per business requirement.

Initial approach at building a scorecard

We know as per the basic lead scoring guidelines, there are 2 major factors contributing to a card logic.
1. Demographic/Ability
2. Activity/Intent

Scoring leads is a process that needs to be first set in motion and then can be refined incrementally.

Without accounting for the previous sales data, we built a scorecard similar to the one above as per our users’ behavior. The scores calculated for a customer were always the same for a certain demographic factor during a course of time but we needed the activity score needed to be more dynamic.

One problem we faced with this strategy was, say you have a customer who has activity score accumulated to value x. Now, introducing a new activity/intent scoring parameter could result in running back through the user’s history to modify the scores accordingly.

Moreover having no data to back up the scoring logic, our results were as follows:

X-axis: Score. Y-axis: Conversion percentage

So, what exactly is the problem here? As per a working lead scoring model, the increasing scores should indicate and guarantee increasing conversion percentage.

Can having a scorecard based just insights might not be the best solution for your sales? How do I actually statistically create a scorecard based on data?

Statistical approach

We couldn’t find any logic that would match our above problem. Although we were able to relate our use-case to an existing real-world problem, the idea remained intact in both cases i.e identifying users for a certain problem based on the user’s ability and intent.

We looked at credit scoring models and how the same strategy could be migrated to aid us in our lead quality. A brief about credit scoring:

The credit score is a numeric expression measuring people’s creditworthiness. The banking usually utilizes it as a method to support the decision-making about credit applications

There is an excellent blog written on how to calculate the precise score based on the Weight of evidence(WoE) and Information Value(IV). You can find the scoring method on

Credit Scoring with Machine Learning

The credit score is a numeric expression measuring people’s creditworthiness. The banking usually utilizes it as a…

medium.com

We took the sales data consisting of approx 25000 elements with 15 user attribute (age, location, family member, language, etc) as our initial data set.
We experimented on attribute buckets to find the ones with strong IV. Once you have calculated information value for all parameters, deciding which factor to take can be accounted for by their IV.

Our dataset yielded out the following information value.

Attribute name and Information Values have been modified for privacy reasons

The attributes with the strongest were Attribute A, B, C, D:

Next step was to generate a scorecard. The score for each attribute can be calculated with the formula mentioned in the above link. Tweaking in the base odds, pdo(points to double the odds) and base points scorecard was generated.

Yay! Finally, we had a scorecard. Now the quality of leads must improve, right? The question to ask now is, “how to validate the scorecard generated”.

If the increasing score ranges show an increasing conversion output for your leads. In order to validate, the same scoring was applied to the dataset for scorecard version 1 and validate it for increasing scores.

The extreme bump in score 75–100 can be accounted for by the narrow support % in the group. Observing the past data by comparing score and conversion, the conversion in increasing score ranges is significantly higher in high scores as compared to the lesser scores.

So, we have a scoring method that shows accuracy. Everything improves with practice over a period of time. The scoring will grow precise with more refined data. When you have more data around the model, there will be a possibility of exploring and achieving more attributes with higher predictive power.

Please drop in your comments and thoughts on the article. To know more about DocsApp check out our website, and if you like what we are doing and want to join us—feel free to write in to us at careers@docsapp.in

Creating a sales lead scoring model

Initial approach at building a scorecard

Statistical approach

Credit Scoring with Machine Learning

The credit score is a numeric expression measuring people’s creditworthiness. The banking usually utilizes it as a…

Written by Vibhor Chaturvedi