Intuition Behind Machine Learning (Supervised) — A Primer

Avinash Sharma V
The Theory Of Everything
7 min readMar 29, 2017

Recently, a few people asked me about machine learning. How the hell on earth does it work?! Their chief issue was they could not get the idea behind machine learning. Simply put, they failed to understand the concept at the core.

The general idea of “if you have data, you can train a machine” is widespread, but without knowing the concept in its core, without getting the “intuition” behind the concept, people will hardly ask the questions like “what kind of data”, “why would it work for this data” and so on. Through this article, I intend to write a few words to let readers understand the concept — why it works, why won’t it work and so on. That way, they’ll develop an understanding of machine learning and grasp it better when they try to learn/read online.

What The Heck is Machine Learning?

Ah, I am not going to give you theoretical explanations here, you can read it pretty much anywhere on the Internet( Wikipedia would be a good place to start). So, a general question is “if I write a code with so many ‘if’s and ‘else’s and it refers to some auxiliary data it stores and improves over time, is it learning?”

The answer — yes!!! It is indeed learning. Anything that improves over time can be considered theoretically a learning algorithm (all though in machine learning, that is not the prime area of focus). “Then why not write code like that with so many ifs and elses and other constructs? Why do we need to train an algorithm”? Well, there are a lot of use cases where the programmer doesn’t know which algorithm can solve the given problem, or he cannot come up with an algorithm to solve a problem ( if and else hell! ) or simply because he is lazy :p.

Hm…let me explain that a bit.

Consider this, a problem statement of implementing a cache. You are to implement a cache algorithm to serve pages for better user experience, and you have limits on how much you can cache. Now, a very naive approach would be to have an “LRU cache”. In an LRU cache, you will have a data structure to store the data about recently used pages and their “recentness”. Consider this as your knowledge base. Then you have an algorithm to look at this and react appropriately given better user experience ( that’s the LRU algorithm, which does nothing but removes least frequently used stuff from cache ).

Now, this would be ideal if the pages visited by users were not random and users had an affinity towards certain pages. LRU will keep caching the “hot” pages thereby improving its quality over time, giving out great user experience ( let’s just say ). So, this is definitely an algorithm that gets better with more usage and gives better results ( user experience through caching ) over time. But there is a problem. With LRU cache, the developer had to decide the “algorithm”, the quality of the algorithm and how it could solve the problem. With machine learning, we try to solve this problem ( 1. Because we are lazy :p, 2. Because all algorithms cannot be found and hand coded ). There are a lot of problems for which the “mathematics” to solve the problem is not worth finding out by ourselves. Example, object recognition. Imagine having to find the mathematics of identifying an object based on its colour, size, an equation of its shape etc, come up with an equation to do that and then programming. We would rather try to find it automatically if possible.

Machine learning is the rescue squad for this kind of problem. Using machine learning, you can let a computer figure out the mathematics, if any, behind the problem. In more formal words, “find y = f(x) function which will yield the y we are expecting for the x we have “ ( think of supervised learning, for those who are well read on the subject )

Sounds cool? So basically, I can let computers find the equation by themselves if I tell them what I want from what input? Yes!, you can use machine learning to find a function which can produce the output that you are looking for, to the given input.

Does that work for every use case?

Well, it’s pretty much mathematics, so “if there exists a function to solve the use case mathematically, then you can find it given you have quality data to learn it.”

How much data? Well, it’s not about how much data ( though more data may certainly help in practical cases ), it’s more about quality data. What does this mean? Well let’s explain this through an example

Consider the two functions y = x and y = |x|

Clearly, mathematically, these two are different.

Now, if I tell you to find the function from the following data set-

1 should give 1

2 should give 2

3 should give 3

… up to infinity

And then I ask you what will be the output for -1, you would say -1. You found y =x function

Would you be able to find y=|x| from the above? You would probably end up assuming that its y=x. But I gave u a lot of data! A lot till infinity!. The point here is that the function shows different behaviour when the input is negative, to define that characteristics of the function you also need some samples of negative inputs. That’s what I meant by “quality” data. And another point to note is, you don’t need “a lot of data” if you have “quality data” capturing the complete characteristics of the function ( forget deep learning and stuff, let’s focus hand-coded machine learning ).

For example,

If I tell you,

-1 should give 1

-2 should give 2

0 should give 0

1 should give 1

2 should give 2

3 should give 3

Then you would be probably able to reach a conclusion that the function is y=|x|, but I didn’t give you a lot of data ( definitely, not till infinity ), but I gave you “defining” data which captured the characteristic of the function I am looking for.

Then why do they use a lot of data?

In real world cases , when you have a lot of data, and you don’t know the function you are looking for, you certainly don’t know which subset of the data will help you find the function (or define the ) characteristics better ( there are techniques for all that, but let’s just stay a beginner for the moment ). And you certainly don’t know if you have good enough quality data to find the function ( think y=x and y=|x| example ).

Hmm, so I have data, I want to find the function, will I be able to do that with machine learning? Just train train train!!?

Well, yes, and no!. Yes — if the data you have is good enough to define the characteristics of the function you are looking for, No — if it is not. (again, let’s stay a beginner for this argument).

There can be cases when you are not able to find any relationship between the inputs and outputs you have. What?! Why!? Probably you are missing something. Imagine trying to find out the amount of water you can fill in a cylindrical shaped vessel by only knowing the radius of its base!. Yes! That’s what indirectly is happening in the above-mentioned case.

Ok so, by machine learning ( supervised ), we are trying to find out a mathematical equation to solve something that we are looking for.

A lot of data is less important than “right” set of data.

What about algorithms?

There are a lot of algorithms, why a lot? If it’s all about data, then we don’t need a lot ( hmm, nice, you are thinking like what the founders of deep learning did ). Well yes, if it’s all about data, the algorithm shouldn’t matter ( dont get me wrong here, i am not talking about using same algorithm for everything! ). There should be a way to find out what we need using every algorithm (devised for a problem, for example classification algorithms for classification) we use. At the same time this is true, in practical machine learning, because of the facts that “data quality is unknown” ,” there is no enough data”, “there is no enough computer power”, “we need it to be trained faster” etc, we use specialised hand-coded algorithms which will get us to the required function faster, with probably lesser data. But in theory, choice of algorithms (from the given set of algorithms for a problem) matter very less compared to the effectiveness of the data ( every algorithm for a kind of problem will solve it given enough data and time, but the amount of data/time may vary ). Well, think about humans! We have 1 algorithm to do everything we do!!! ( well its not like we have already discovered what that 1 great algorithm is , though deep learning has great advancements. Will talk about it in another article in future) I’ll probably explain more on this in an another article comparing regression and classification algorithms and the difference.

Though the above write-up is more inclined towards supervised learning, it should get you a feel of what machine learning is all about, and that is not magic, but mathematics. Hm, let’s say Magematics.

--

--