How Machine Learning works

A brief and concise summary of what Machine Learning is, nominated by all but explained by few.

Published in

The Blog of a Computer Scientist

7 min readAug 29, 2019

On several occasions some people asked me: “ But in the end what is this Machine Learning, what does it mean ?”.
The mainstream newspapers , especially in recent times, talk about it all the time. Unfortunately they talk about it (sometimes rightly so) as if it were a blackbox and / or as if there was something “magical” behind it. It is often used as a synonym for Artificial Intelligence , although it is a branch of the latter.

Giving a detailed answer is not simple and eludes the purposes of this article. My purpose in this case is just to clarify the ideas a little. To get more details, look more closely on the web where you can find books, scientific articles and technical examples that are certainly exhaustive. Therefore I will try to touch the keystrokes by answering simple questions for those who do not know anything about this topic (although I expect that those who are about to read some basic concepts of computer science know them). In this regard, I believe and hope that the following answers to the respective questions can resolve some doubts.

Is there a formal definition?

One of the most cited when it comes to Machine Learning is the one coined by the American computer scientist Tom M. Mitchell, author of the book “Machine Learning” dated 1997:

It is said that a program learns from experience E with reference to some classes of tasks T and with measurement of performance P , if its performances in task T , as measured by P , improve with experience E.

When can we say that a program learns?

First of all it is necessary (trivially) to underline that the Italian translation for Machine Learning is Automatic Learning . Considering also the previous definition, a program is said to “learn” (learn) when it manages to perform a task better than before having gained experience (perhaps from its mistakes) just as a human being would do. The main activity of machine learning is therefore to develop models (algorithms) capable of learning from data.

Why is Machine Learning connected to Artificial Intelligence?

Precisely because it gives machines the ability to learn as a human being would do, machine learning can be seen as a branch of artificial intelligence. The use of machine learning, therefore, allows us to obtain what is called IA Weak opposed to what is called IA Strong.
The substantial difference between the two is as follows:

IA Strong: Machines that are a real replica of human intelligence, if not superior. They have a real awareness of themselves (Science Fiction).
IA Weak : Machines that have the ability to perform tasks related to intelligence. They have no conscience. (Reality)

What changes from Traditional Programming?

In machine learning mathematical models are applied for the realization of “complex” algorithms or that would be almost impossible to achieve through the simple or traditional programming.

In traditional programming,the computer performs a processing where, given as input a program (software) and its data (input), a given output is obtained net of processing. With machine learning, on the other hand, the computer is supplied with an input and (possible) output on which the computer not only processes but also learns, or processes and learns from data to return a model (software) as output. In a sense, while in traditional programming we tell the computer what to do in machine learning we say to the computer how to learn and then do . It is precisely the difference between these two approaches that distinguishes Rule-based system (rule-based expert systems) for machine learning models. One last thing to clarify is the fact that many of the machine learning algorithms used today have existed for decades. What made it possible today to make them actually useful and usable are two factors: the computational power that computers have today and the large amount of data available today (the so-called Big Data ) used to train machine learning models.

What problems are you trying to solve?

During the reading you can realize and guess that some of the applications you find on your smartphones, without you noticing, already solve some of the problems that will be listed. Nowadays there are more and more different sectors in which machine learning is applied. Its application in the most diverse fields is becoming a de facto standard . Needless to say, giants like Facebook , Netflix , Google , Amazon , Apple and many others do research and development on machine learning issues and make massive use of them for their systems. Even today there are processors, like the ARM processors based on the Project Trillium , which integrate machine learning internally.

After having seen what the problems are, it should not be too difficult to guess in which cases machine learning is applied today and in particular by the companies listed above.
The problems can be summarized in 3 types: Classification , Regression and Clustering .

Classification : The objective of the classification is to create a model that can categorize (label) the inputs. For every input given to the model the probability that that input can belong to a specific category is returned in output. Particularly when there are only two categories we speak of binary classification (dichotomous). Examples of such problems are:

Recognize healthy patients from pathological patients.
Recognize a particular voice or sound.
Recognize faces, objects, animals and landscapes in an image.

Regression : The objective of the regression is to create a model that knows how to predict a value given a new input. For each input we have an output value thanks to an approximation of a function that can describe the “relationship” (the curve) between input and output data. Classic problems of this type are:

Forecasting of the forecasting of a security on the stock exchange.
Forecast of the selling price of a house.
Forecast of a company’s turnover.

Clustering: The goal of clustering (grouping) is to combine the input data into homogeneous groups. All the elements belonging to the same group share some similarity between them and a dissimilarity with respect to the elements of other groups. In this way we obtain a model that, given a new input, knows how to associate this input to one of the groups. Classic problems of this type are:

Organize users into groups to apply the same marketing strategy to users belonging to the same group.
Group the population based on specific parameters (income, educational qualifications, financial position, etc.).
Segment a digital image based on different colors or shapes.

What are the approaches?

There are mainly two approaches to machine learning: Supervised and Non-Supervised.

Supervised Learning
In supervised learning the machine learns from what is called the Training Set which contains the relationship that we know exists between input (training data) and output (label or known values associated with training data). To have a good training set you also need a good data mining and data pre-processing phase . Once the model has been trained, its accuracy can be assessed using a Testing Set. The latter is similar to the Training Set but the purpose is not to train the machine learning algorithm, but to evaluate the model obtained with the previous training phase. After the training and evaluation phase, the resulting machine learning model can be used so that given a new target (the data being investigated) the value associated with that target is obtained (the label found or a predicted value). Examples of problems solved through supervised learning are Classification and Regression .

Unsupervised Learning
With unsupervised learning a Training Set is always used but without providing any relation between input and output. The lack of relationship is due to the fact that we do not know, a priori, what it is. In other words, we do not have a reference target (a known output) for the input data. Therefore in this case the specific model is left to find the relations, if any, intrinsic to the data structure. Given a new target, you get the value associated with that target (a probability or the identifier of a cluster). Unsupervised learning is used for data clustering (and visualization) issues.

Conclusion

In fact there would be even describe the semi-supervised learning and the reinforcement learning (Learning Reinforcement) but for the sake of simplicity I preferred to omit them.

This short article certainly cannot be exhaustive, as already announced. However I hope that it can be a starting point, a clarification and an introduction to the topic for those who knew nothing or who had doubts about some issues that concern such an interesting and at the same time complex topic like Machine Learning.

NB: For those who have come to read so far and are intrigued or want more in-depth readings that show more concrete and real examples of application and implementation, I refer them to this interesting series of articles written by Adam Geitgey and called Machine Learning is Fun! .
For more academic references, I cannot fail to mention a milestone in Machine Learning, namely the book by Christopher Bishop which is called Pattern Recognition and Machine Learning (Information Science and Statistics) used in almost all university courses dealing with these issues.
Finally, for those who want to experience first hand one of the many examples of online learning that are online, the IBM website has a Visual Recognition service (available at this link ) where you can test the image classification.