ML Basics: supervised, unsupervised and reinforcement learning

Gustavo Machado
3 min readOct 6, 2016

I’ve been following the Machine Learning space for a while now, and it’s becoming a more and more recurring topic of discussion with founders who want to add ML to their products. One problem that seems common is the difference between supervised and unsupervised algorithms. Obviously, most non-tech people don’t know these names, but naturally they tend to mix these types of problems/algorithms.

Another source of confusion are “reinforcement learning” algorithms, so I thought I’d post a small explanation of each of these, for those of us who don’t master Machine Learning (yet?) :)

Supervised Algorithms

I’ll start with supervised, because I believe it’s the simplest one to understand. In supervised algorithms, you may not know the inner relations of the data you are processing, but you do know very well which is the output that you need from your model. For example:

“I need to be able to start predicting when users will cancel their subscriptions”.

Notice that the output of you model is already defined: “will user X cancel his/her subscription”. What you may not know yet, is HOW to realise which users will cancel. So you can use an existing set of data to “train” a model into predicting this particular aspect about your user. The training of the model usually uses part of the data to “learn”, and part of the data to validate and measure how accurate the model is.

So for example, if you have 10,000 user’s usage history. From these, maybe 5,000 cancelled and 5,000 are still using your product. So what you can do, is take data from 4,500 users who cancelled, and 4,500 from users still using the product (data from 9000 users total). Train your model with this data, letting it “see” which cancelled and which are using it. After your model is trained, is ready to start predicting, so now you can feed your model with the data of the 1,000 users you left out, except you won’t let the model see which cancelled and which didn’t. The model will do it’s best to predict the status of the user, and you can compare with the real value. If out of the 1,000 users, the model correctly predicted 891, then the model has a 89,1% accuracy.

Unsupervised Algorithms

With unsupervised algorithms, you still don’t know what you want to get out of the model yet. You probably suspect that there hast to be some kinds of relationships or correlation between the data you have, but data is too complex to try to guess. So in this cases you normalize your data into a format that makes sense to compare, and then let the model work it’s magic and try to find some of these relationships. One of the special characteristics of these models, is that while the model can suggest different ways to categorize or order your data, it’s up to you to make further research on these to unveil something useful. You can think of it as augmenting your data with information about inner relationships, but it’s up to you to make sense of this new information.

For example, after processing all the data related to all your product’s users with an unsupervised algorithm, it might come up with a way to group your users into 2 groups. After inspecting and comparing these two groups, you might realise that group A is in a geographic location, and group B in another one. Whether you can act upon this particular segmentation of the data, is up to you to figure out, and if not, then maybe you can remove or re-arrange the data about user’s location to force a different segmentation.

Reinforcement Learning

The reason why I included reinforcement learning in this article, is that one might think that “supervised” and “unsupervised” encompass every ML algorithm, and it actually does not. There are algorithms that aren’t supervised nor unsupervised, like Reinforcement Learning.

Reinforcement learning is the field that studies the problems and techniques that try to retro-feed it’s model in order to improve. In order to accomplish this, RL needs to able to “sense” signals, automatically decide on an action, and then compare the outcome against a “reward” definition. RL tries to figure out WHAT to do to maximize these rewards, but it does this by itself (no direct instructions).

RL is not exactly supervised, because it does not rely strictly on set of “supervised” (or labeled) data (the training set). It actually relies on being able to monitor the response of the actions taken, and measure against a definition of a “reward”. But it’s not unsupervised learning either, since we know upfront when we model our “learner” which is the expected reward.