Accelerate Growth With Internal Machine Learning

How to build a tool that is useful for conversions

William D'Souza
Geek Culture
Published in
9 min readMay 12, 2021


Photo by Nick Fewings on Unsplash

Machine learning applications have made large impacts in several industries. The study and practice of machine learning have led to several discoveries and innovations that have made profound impacts in many communities. Practitioners are consistently making applications for all industries, some of these include tools for governmental purposes, medical research and development, and commerce. Everyone wants to use it to their benefit, whether it be financial institutions assessing levels of risk or technology companies building creative products tailored for their users.

Today, we are constantly bombarded with advertisements promoting the use of artificial intelligence to better serve consumers. We are always targeted by recommendation systems that will find music we enjoy and it’s almost normal now to be speaking with customer service bots to solve our problems. Teams of data scientists are constantly working on sourcing new data to develop and enhance models.

In your organization, it isn’t only your external customers who may benefit from data science products, it is also your internal customers. There is plenty of room to implement simple and effective models that will go a long way for users. It’s essential to have the tools that will leverage all the information from the data you store in a way that is useful across your organization, it can even help accelerate your goals.

A problem that is more common than you think

Photo by Olav Ahrens Røtne on Unsplash

For startups that are still honing in on their sales cycles and dealing with a large influx of leads, identifying which leads have the most potential can be a daunting task. Increases in marketing spend will bring in more prospects and with that, there comes more garbage that needs to be sorted out. Your startup may have a small sales team and can only get through so many leads in a day and on top of that, you are looking to turn over customers as quickly as possible so you can decrease your time to cash flow. Having humans sort out these lists is inefficient, the cost of labor adds up, and time can be spent productively elsewhere. Machines can do these tasks easier, faster, and smarter.

Start giving your leads a score

I probably don’t have to explain what a lead scoring model is… but for those that may need an explanation, it is simply the ability to assign some type of value to quantify the potential of a prospect. Many organizations already have ways they are practicing this whether it is quick and dirty or something much more intricate. We already tend to do these in our heads, humans rank things all the time! If your system does not leverage the power of machine learning, I would highly recommend starting to experiment with it in your sales cycle.

The score should be simple for anyone to ingest. After all, people who will be acting upon this should understand it easily whether it is absolute or relative. The output should give the user the ability to prioritize their list of leads, monitor changes in scores day-to-day, and intuitively recognize patterns in cohorts so that they are more informed when faced with new leads with similar scores in previous cohorts.

A comparison of approaches to scoring your leads

Photo by Jens Lelie on Unsplash

There are definitely many ways that you can score your lead, but for the sake of the topic, we only need to focus on two approaches for comparison

The Simple Approach

There’s an easy way to score your leads that can work, which involves some basic domain knowledge and analyses. Depending on what is important in your domain, you can start by listing the most important actions that are driven by your product. Firstly, look at your leads that have converted and conduct an analysis to understand the actions that are centered around the features of your product to establish importance. From there you can develop thresholds based on the most important actions. You may find that 95% of leads that completed certain actions converted at a significantly higher rate than those that didn’t. This is valuable information, and it is easy to build on top of it by incorporating actions that you care about. From there, you can use the information you obtained to start ranking your leads. There are some benefits to doing this:

  1. You can control what the inputs are, which creates focused talking points with your prospects that conversations can be centered around.
  2. It requires simple analyses that can be done quickly, allowing you to start experimenting faster
  3. You can develop an understanding of what you should be focused on getting your leads to do, and push for what you know is successful.

Although this approach can work, it is not the best way to do this. In fact, you are probably incorporating a level of survivorship bias by doing this. This can develop tunnel vision as you may be focused on putting all eggs into a single basket. Another problem with this approach is that it is quite short-sighted in thinking; the assumption that all or a small number of actions is important and leads to a conversion is a flawed way of thinking. Looking at what works while not gaining an understanding of what doesn’t work won’t lead to any long-term improvements. Your product may also be constantly changing, so the upkeep of doing this is tedious. Instead of talking about the issues, let’s just focus on a better alternative.

The Cool Kids Approach

If your business generates a large number of leads, this will work great! If your business generates a smaller number of leads, you will need a lot of historical data to get this to work. It will still be possible to do this but you may run into an issue with stale and outdated data that won’t be a clear representation of your current customers. Building a machine learning model to rank your leads will benefit you greatly; it will give a more accurate and recent representation of what works for your product and will overall capture patterns better. The model will understand what works and what doesn’t work, and depending on the type of model you use (if it is not a black box) you can also gain interpretability when communicating with stakeholders.

To do this, it’s first important to identify all the touchpoints your prospects will deal with. This involves but is not limited to: feature usage, customer support messages, email marketing, and sales touchpoints. The most important thing is your feature usage, as it is the best indication of the interaction between your customer and product. If you are not tracking your customer’s paths or actions, then you really should be. The advantages of scoring your leads via a machine learning model far outweigh any reason not to do it, some of these advantages are:

  1. Can better identify quality leads with more accurate scores
  2. Can be trained over a sliding window, so that it is always representative of what is currently working/not working
  3. A model can produce scores that make it easy to digest and prioritize
  4. Feature importance can be conducted, the model may be interpretable depending on which one you choose
  5. Can be used as a root for more powerful applications that can lead to new customers quicker

The model you are building is most likely a supervised classification model. Your input data should be all the customer touchpoints. There is much more that you can do but that is just the base you are looking to create. Depending on the model chosen, you may be able to extract the probabilities for the predictions. If not, you can still predict conversion, but it will be harder to prioritize your leads. From here, there is so much more that can be done, from interpretability and feature importance to building automated workflows to push conversion faster.

Try to be a teacher (even if you suck at it), it’s important in the long term

Photo by Element5 Digital on Unsplash

There are two main concerns with building a model that will score your leads. Since real-life people will be working off the output of your model, it can create potential (and natural) issues. The first thing to know is that you don’t want to artificially increase your scores. If you understand what moves the needles, people may tend to only push for that, and will it turn artificially increase the scores. It is important to note that the score is only a representation of what is working, and a human may not be able to know exactly how the model works. If they are not well versed in statistics, it can be hard to explain the dangers of assuming causation. The score is not a KPI, it is not meant to be intentionally increased in a gamified manner.

The second major concern is the danger of a feedback loop. There is a good chance that your training data may have a slight lag (it may not include leads from your current month), but it is still a good representation of what has worked in the recent past. When you output these scores to your team and they go and act upon these scores, the next time you train your model these new observations will be included in your training data. It is easy from here to get stuck in a feedback loop and only attribute success to what you have started claiming success to be. To combat this, it is important to train your team to implement an exploration vs exploitation strategy. It is natural for people to want to go for the easy wins, it would be a lot easier to convert a 95% lead than a 35% lead. This doesn’t mean that the lower-scored lead is bad, and your team should still be acting on them so they can try to grab a win; it will also benefit the model long term. If you don’t allow the model to learn, then you get further stuck into the feedback loop. Your team should exploit the higher conversion leads when they are in a crunch and when they have more free time and have the ability to do so, they should be exploring leads with lower scores.

A healthy amount of work will go a long way

Photo by Joshua Earle on Unsplash

Because the idea of scoring leads is tied so closely with conversion, it may seem daunting and scary to do something like this. It’s important to test this out and make sure your input data, scoring metrics, and methodologies are accurate and acceptable. It is not a turn-key solution as it requires involvement beyond just building the model. Every time you train the model, you may notice an adjustment phase with those using the model. Just as the model can make connections, so are humans. People will get used to recognizing patterns associated with scores and will start making conclusions from it, so when you retrain the model and the scores adjust, what were once patterns that could be associated are now altered. What worked yesterday may not work today, and the model will better catch that faster than humans probably would!

Creating this base model paves the path for more successful applications that can work in conjunction. If you can know what works, you have now created the opportunity for automation. If your lead just went from 30% to 90% overnight, why not send them a promotion first thing in the morning?



William D'Souza
Geek Culture

providing solutions for common data problems @ Kizmet Solutions.