Enriched Data With Machine Learning

Laurent Kinet
Bisnode Analytics
Published in
2 min readJan 10, 2018
Pierre Deville, PhD, Head of Data & Analytics @ Bisnode Analytics

Bisnode Analytics is the Group’s speedboat for Data and Analytics and have successfully tested one of the developed solutions within Bisnode.

The proof of concept was made in 2017 with a solution for Machine Learning and together with Bisnode in Switzerland the solution managed to crawl company data on different company web pages to detect the number of employees.

Pierre Deville, Head of Data Analytics, is one of the co-founders of Swan Insights, the Big Data company from Belgium that Bisnode acquired in the beginning of 2017, and that eventually became Bisnode Analytics.

We crawled “number of employees” in 10 000 companies from Switzerland. The goal was to collect information about how many people that are working in each and every company without having to search for the information manually. What we did was to crawl the data and learn the machine to recognize where and how to find the information automatically, said Pierre.

Three different tracks to enrich the data
The team created three different tracks to collect the information and to evaluate how trustworthy each track was by itself and also the three of them together. This method could be used as a measurement over time to see if the amount of employees increases or decrease in a company. Another example: if the sales department want to sell their solutions based on the size of companies this saves a lot of time to not have to do it manually.

  1. Detect names (people on the website). This is done by an automated algorithm that go through relevant web pages and count the actual numbers of people. The algorithm can understand the number of people that are working on the company.
  2. The algorithm can detect patterns in the text on the web page, for example “Bisnode has 2400 employees in 2017”.
  3. The company page on social networks. The algorithm can find the social pointer from the web page, follows the link to find out if they have written the number of employees in their social channels.

This is three different tracks to find the same information. The reason to use all of them is to see if it is possible to find the same information on all of them to increase the trustworthiness.

“This is just a start but we are positively surprised by the current coverage. Next step is to discuss how to make improvements to always be better. One challenge we have is to improve the quality and become more precise. Over time the idea is to offer this kind of machine learning to other companies and countries as the algorithms are built to scale on different geographies and languages”, said Pierre.

--

--