ML Basics: supervised, unsupervised and reinforcement learning

I’ve been following the Machine Learning space for a while now, and it’s becoming a more and more recurring topic of discussion with founders who want to add ML to their products. One problem that seems common is the difference between supervised and unsupervised algorithms. Obviously, most non-tech people don’t know these names, but naturally they tend to mix these types of problems/algorithms.

Another source of confusion are “reinforcement learning” algorithms, so I thought I’d post a small explanation of each of these, for those of us who don’t master Machine Learning (yet?) :)

Supervised Algorithms

“I need to be able to start predicting when users will cancel their subscriptions”.

Notice that the output of you model is already defined: “will user X cancel his/her subscription”. What you may not know yet, is HOW to realise which users will cancel. So you can use an existing set of data to “train” a model into predicting this particular aspect about your user. The training of the model usually uses part of the data to “learn”, and part of the data to validate and measure how accurate the model is.

So for example, if you have 10,000 user’s usage history. From these, maybe 5,000 cancelled and 5,000 are still using your product. So what you can do, is take data from 4,500 users who cancelled, and 4,500 from users still using the product (data from 9000 users total). Train your model with this data, letting it “see” which cancelled and which are using it. After your model is trained, is ready to start predicting, so now you can feed your model with the data of the 1,000 users you left out, except you won’t let the model see which cancelled and which didn’t. The model will do it’s best to predict the status of the user, and you can compare with the real value. If out of the 1,000 users, the model correctly predicted 891, then the model has a 89,1% accuracy.

Unsupervised Algorithms

For example, after processing all the data related to all your product’s users with an unsupervised algorithm, it might come up with a way to group your users into 2 groups. After inspecting and comparing these two groups, you might realise that group A is in a geographic location, and group B in another one. Whether you can act upon this particular segmentation of the data, is up to you to figure out, and if not, then maybe you can remove or re-arrange the data about user’s location to force a different segmentation.

Reinforcement Learning

Reinforcement learning is the field that studies the problems and techniques that try to retro-feed it’s model in order to improve. In order to accomplish this, RL needs to able to “sense” signals, automatically decide on an action, and then compare the outcome against a “reward” definition. RL tries to figure out WHAT to do to maximize these rewards, but it does this by itself (no direct instructions).

RL is not exactly supervised, because it does not rely strictly on set of “supervised” (or labeled) data (the training set). It actually relies on being able to monitor the response of the actions taken, and measure against a definition of a “reward”. But it’s not unsupervised learning either, since we know upfront when we model our “learner” which is the expected reward.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store