Data Scientists care about trees. Trust me, we do.

bunq
bunq Blog
Published in
4 min readJun 19, 2020

Machine learning and AI are hot topics in today’s data-driven day and age. Daniela, one of our Data Scientists, explains how you can use data too!

Let’s start from the very beginning

Machine Learning (an essential part of AI) ranks as one of the hottest and most promising fields of study these days. All over the world, companies of all sizes have come to acknowledge the importance of using their data to unearth hidden insights, insights that are useful for their internal decisions, as well as for improving their user’s journeys.

In this article we’ll bring you up to speed on the ins and outs of Machine Learning and, more importantly, how a data-driven company like bunq uses it to improve your banking experience.

So what exactly is Machine Learning?

Are we actually creating self-sufficient and intelligent beings?

Well, not exactly. Although the name sounds fancy and intimidating, Machine Learning is making computers learn from huge amounts of data.

Still a little bit confusing?

No worries, let’s explain Machine Learning using something everybody knows and is familiar with: a tree.

Why? You’ll see.

The tree of life

At bunq, we are extremely proud to be a multicultural company that employs people with over 37 different nationalities. This means we can find diverse tastes in things like music, movies or even hobbies. Now, suppose we want to create a Machine Learning model that predicts what kind of hobbies a new bunqer would like, based on our fellow bunqers’ interests. Then, the first step would consist of collecting data!

Collecting data

First of all, let’s create our training dataset, which is a table that gathers our bunqers’ data. Each row of this dataset will contain different attributes for each of them, which are commonly known as features. Some examples of features would be: age, country of birth, gender, position, etc. But since these features do not tell us much to actually predict their hobbies, we should also include other types of features that tell us more about their behavior or preferences, such as: number of hours of free time per week/per day, if they like outdoor activities, if they enjoy group activities, etc. The more features we have, the better!.

Now, let’s ask them what their favourite hobbies are. These are going to be the labels of our model. The following graph shows an example of a set of features for each person, including their labels.

Creating our tree for predictions

Having completed our dataset, a computer program can use this information to discover rules that classify all hobbies accurately. These rules can be visualized as a nice tree like the one below, where the nodes represent the features, and the branches are created based on all different values these features can take. (Note that we used only a few of the mentioned features for simplicity).

This type of method is called a “Decision tree”, and it is one of the most commonly used Machine Learning methods, as it is highly interpretable and can deal with large and complex datasets. You can use as many features as you want, and the model will always try to find a path that takes you to the right label.

Now, let’s make some predictions! Suppose our new bunqer is 27 years old and loves group activities. What should we recommend her? Just follow the path in the tree! The answer is Salsa dancing!.

To check the accuracy of our model, we create a test dataset, which looks exactly the same as the training dataset, but this time, it includes information of people we didn’t include in the training dataset. With our new decision tree, we predict labels for all the data points in the test dataset, and afterwards we compare the predictions and the real labels to see if the predictions make sense. If everything looks good, we can deploy our new model to production.

How do we use Decision trees?

In reality, at bunq we need to solve way more complex problems than this one, where datasets contain millions of rows and hundreds of features, which makes this task a real challenge. Since Decision Trees are highly explainable and have also proved to be accurate in many different scenarios, we have successfully implemented them in most of our projects. For example:

  1. To predict suspicious behavior and prevent fraudulent transactions related to money laundering, phishing, scams, etc. This way we can maintain an adequate monitoring system and can ensure your safety.
  2. To predict what users are not satisfied with our service or need more assistance when using our app, so we can always provide them with the best service.

Data protection guidelines are always followed in our company, which means we are fully aware of the sensitivity of our data and all ethical implications that are involved. Therefore, this is an important remark we encourage our Data Scientists and other Data practitioners to embrace.

Last but not least, we encourage you to have fun and don’t be afraid of trying Machine Learning yourself! It’s never too late to learn about Data Science and there are plenty of learning platforms with courses for beginners and experienced coders, so go for it!.

Originally published at https://www.bunq.com.

Want more content like this? We’re keeping it fresh here!

--

--