How Universal Avenue built a Machine Learning model to accelerate B2B sales

This is the story of how Universal Avenue built a machine learning model to qualify leads efficiently for it’s sales processes, using owned and third party data. The article is a summary of a longer paper presented at the 2017 ESOMAR Big Data World by Jonatan Hedin and Stephen Kirk

The problem: going after the right customer again and again

Eric is a 26-year old hard working sales rep working in a large European city. He spends his days presenting his products — mainly digital B2B services — to new potential clients, as well as chasing down fresh prospects that will eventually result in more meetings.

Once I’m sitting down with the owner of a small or medium sized business, I know I have a 50/50 chance of closing the deal. Problem is, at the end of the day, I spend most of my time going after people that will not take time to sit down with me.

When you go through a process repeating manual work thousands of times, then saving minutes or even seconds becomes extremely valuable. The prospecting part of sales is no exception. As a tech enabler of direct sales towards small and medium-sized businesses, Universal Avenue collected data on over 50 000 sales attempts through its mobile and web apps by the spring of 2017. For sales reps around the world, reaching out to dozens of companies every day in order to end up with a handful of customers is common procedure. Universal Avenue’s mission is to simplify this process for sales companies and their sales reps.

A bit of background: Direct Sales in 2017

Consumers have been quick at adopting the opportunity of buying goods and services online. Since the early 2000s, B2C e-commerce has all but put physical book shops out of business, revolutionized value chains of the entertainment business and emptied shopping malls around the world. The development has both been driven by and an enabler of new, clever digital marketing channels.

Meanwhile, sales toward small and medium sized businesses (SMBs) is still very different. These companies can not be reached through clever digital advertising like one can do with consumers, and their wallets are simply not big enough to justify in-house key account managers targeting them. As such, sales toward these companies are often dependent on a combination of phone calls, email chains and face to face meetings over a cup of coffee. Since the process is so labor intensive, large areas of potential markets are left unserved by digital service providers. It is of little surprise that many hairdressers still handle bookings through phone and a physical calendar.

The sales process from raw prospecting to closed sales

For reaching this range of traditionally “unprofitable” customers, what about turning the perpetual question of “what customer is profitable to go after with my portfolio” upside down? Instead, we suggest asking “what digital services are valuable enough for small businesses to justify a sizeable chunk of their wallet?”. When exploring this second question, we collected valuable data that we used to build a machine learning model telling us which customers would be interested in a specific product.

The heavy lifting: Finding the right data and getting it in the right place

With the goal of improving the efficiency of salespeople by pre-scoring the raw leads they receive, we set out to build a machine learning model that scores venues based on our internal and external data sets.

By early 2017, we had collected data of roughly 20 000 successful and 30 000 unsuccessful sales attempts from five different markets. Some customers bought multiple products, and some also had additional data collected, such as survey results. We also have access to activity data (such as any visits we have made) and various external data sources containing, for example, financial data. With the goal of improving the efficiency of salespeople by pre-scoring the raw leads they receive, we set out to build a machine learning model that scores venues based on our internal and external data sets.

A typical problem in sales is filtering a large amount of leads to identify which are worth contacting. For the salesperson, a large proportion of “stale” leads result in lower commissions. For the potential customer, irrelevant pitches waste time. With the goal of improving the efficiency of salespeople by pre-scoring the raw leads they receive, we set out to build a machine learning model that scores venues based on our internal and external data sets.

A short outline of machine learning — the email spam example

The machine learning model tends to detect insights hidden in data and adapt over time as more data is collected.

Machine learning can be interpreted as a computer learning from data without explicit programming. With machine learning, we can leverage data features and detect patterns that we might have a hard time inferring from the large data pool. Some applications include email spam detection, credit card fraud detection, and optical character recognition.

So, how is machine learning applicable in these scenarios? Consider the case of email spam detection. Any given email has a set of features (parameters) associated with it; such as the length of the email; the content of the message; the sender of the email; and the time the email is sent. A common spam email, on average, might differ from the average email: it might be sent at odd times and/or frequently contain words or phrases associated with spam emails (such as ‘mortgage’, ‘watches’ or ‘extra income’). While a skilled operator might be able to establish manual rules that detect spam emails (say, counting frequencies of certain words) with a decent probability without too many false positives, there is often a lot of information to leverage from the underlying data.

Machine learning today — the email example

A machine learning model could be built by taking a large sample of emails that have been manually classified either as “spam” or “not spam”. These records are then fed to a machine learning algorithm, that uses this data to train itself. After this, the machine learning model can be used to classify new emails as “spam” or “not spam”, based on the features of the new incoming emails.

So, how do you create this initial classification, or improve it over time? While some manual labor might be needed initially, you are most likely helping your mail provider today with this if it sometimes asks you if an email is spam or not.

Contrast this to the rules-based approach we described earlier; assuming the data scientist has built features that make sense, the machine learning model tends to detect insights hidden in data and adapt over time as more data is collected. However, building “features that make sense” turns out to be a large task in itself.

Feature engineering

In many cases, the machine learning algorithm cannot make meaningful comparisons on its own. The underlying data needs to be reinterpreted into information that the algorithm understands. Often, applying domain knowledge is required. Generally speaking, this is the most time-consuming step of building a machine learning pipeline.

Coming up with features is difficult, time-consuming, [and] requires expert knowledge. ‘Applied machine learning’ is basically feature engineering. Andrew Ng, Machine Learning and AI via Brain simulations, 2011

Different data sources often require different ways of constructing the feature, and different ways of constructing the feature also affect the outcome of the model. Here, we will present how we constructed a feature from the industry sector codes that companies in Europe use.

An example of a five-level NACE code breakdown for a Swedish hairdresser

We generally have access to the companies’ NACE codes in the European market, which determines which industry sector they belong to. It is hierarchical, with the first numbers representing a broader category, and additional providers providing additional detail.

From this, it becomes obvious that companies that share the first four digits (in the example above, companies that operate in the hair and beauty industry) are more similar to each other than ones that only share the first number (service companies) — but that the similarity is hierarchical from left to right (that is, a company with a NACE code of 96021 is not in the same industry as one with a NACE code of 46021).

However, this is not something the model intuitively understands, and we need to “translate” this for the model. This is a clear example of why domain knowledge is necessary in feature engineering, and by extension, building successful machine learning pipelines — without understanding how the industry classification works, useful features cannot be built.

While the model does not understand these seemingly arbitrary digits, the model would understand if we in some way were able to tell it that “company 1 is more similar to company 2 than company 3”, and so on. So what we did was to calculate how similar every company is to every other company, and then mapping these similarities to three dimensions.

Example of venues (companies) with their corresponding NACE codes.

Above, we see a company per row, and what industry sectors that company operates in. By calculating how similar these companies are to each other (a task omitted in this article due to brevity), we then map the similarities to three dimensions using principal component analysis. This results in a mapping where venues that operate in similar business sectors are “closer” to each other in a space (in our case, a three-dimensional one), allowing for the model to draw conclusions from an industry perspective. Another advantage of mapping it to fewer dimensions is that it often becomes easy to visualize in the form of a plot. In the plot below, there is clear clustering effects where companies operating in similar industries are closer to each other.

Plot of the NACE codes mapped to three dimensions using principal component analysis. Companies that operate in similar industries are closer to each other in the 3D space.

Machine learning model — results

So, what are the results of this? By extracting the importance of the features — that is, how much impact the features had in calculating its guesses — in our preliminary model, the two main takeaways are:

· What industry sectors the companies operate in and if we have made sales in the area previously affect the outcome success the most

· The size of the company (as measured by the companies’ turnover) seems not to matter as much.

Validating the concept

After building the model and testing its accuracy, we scored 5000 randomly selected SME companies that we had not previously contacted. We divided these prospects into 50 batches, and assigned freshly hired sales reps on them.

Sales representatives at work at the Universal Avenue call center in Stockholm.

Half of the lists were scored — that is, that each venue had the model’s estimated score of relevance attached to it. Every morning during the experiment days, the sales reps were handed a scored or an unscored list, and asked to dial companies on them. Salespeople found it easier to get through to venues that were deemed highly scored by the model — more specifically, venues that picked up calls had, on average, a higher score than venues that the salespeople were unable to get in contact with.

What we’re doing now and takeaways for you

At the point of writing this, we are looking to fill in gaps in the data sets that were uncovered when developing the model. Further, we are also looking into matching specific products with venues to provide tailored recommendations to our customers.

During this process, we would summarize three takeaways for companies that are looking to pilot their first machine learning project.

· Find your outcome variable. In our case, the answer was relatively simple. We want to understand which venues to sell to, and as such previous sales are an appropriate outcome variable. However, the desired variable is not always directly measurable. In those cases an appropriate proxy can be found.

· Look at your data sources. How is your data coverage of your observations? Do you have access to both internal and external sources? Do you have the capabilities of both domain understanding of your data and the ability to engineer features from this data? Understanding the underlying data is often necessary for a good end product.

· Find an easy experiment and build a MVP. Goals change over time in organizations, and as such you’re often engineering for a moving target. Further, you will find insights you will want to use during the model building process.

About Universal Avenue

We help small and medium sized business digitalize parts of their business and suppliers connect with customers they would be unable to reach through other channels. We’re currently hiring backend developers (Ruby, Elixir), frontend developers (we love React!), and mobile developers (iOS, Android, React Native) — drop us an email.