Supervised Machine Learning Scheme
Definition:
A definition of Supervised Machine Learning is machine learning that has human help or supervision in the learning process.
Explanation
The human help is by sorting and labelling the data that is fed to the computer, in other words making sure that the computer receives and uses clean data and it does not need to figure out what the data is.
For example, if you are going to use pictures you would label the picture with the description of what the picture is about and this could be as simple as “dog” or “cat”. In addition to labelling the image supervised machine learning benefits from making all the images the same shape, resolution and file type (jpg, png etc).
The problem that is being addressed are classification of the data which means what groups can be identified and regression which is how the computer can use the information to identify an unseen query.
There are four main parts to Supervised Machine Learning:
Data preparation — this is where human intervention is used to sort the data that will be used for training and making sure that the data is correctly labelled and sorted so that the machine will be able to use that data to learn how the human has classified the data.
Training the model — there are various machine learning tools that can be used to train the model, during this course the fast AI API will be used which allows us to train using different approaches in the first part of the course that will be neural networks. The Neural networks will identify patterns in the data and will use those patterns to allow for the automation of the process.
Testing the model — Once the model has been trained it needs to be tested, this can be done by preparing a set of data that the model has not seen but is known. The test is to see if the model can identify the human identified groups.
Tuning the mode — Since the model is bound to make mistakes there are ways to fine tune accuracy of the model and during the introduction course, we will see just how accurate the results can be.
Data preparation
The preparation of the data for Supervised Machine learning is the most important aspect of the process, in Supervised Machine Learning projects data wrangling can take up most of the time and resources. The more clearly the data is defined and sorted the more effective the learning process can be.
For image classification it is beneficial to make sure that the images are the same file type (this is not essential, but it does resolve some issues pop up in classification models). The images should all have the same number of channels, more specifically should all be RGB and not mixed with Grey scale, CMYK, index colour, etc. This is especially true when you are collecting your own data to check for these aspects while you are preparing the data.
If you are using other kinds of data such as text or tables the same principle applies make sure the data you are using is in the same format and language — don’t mix languages in your raw data sets.
If you download data sets from other sources it is advisable to validate the data and check that the data sets are clean.
Balanced data is another important aspect, you should try to collect the same amount of data for each group. If you are training a model with dogs and cats, make sure that if you have 50 images of cats that you have between 48 to 52 images of dogs. Unbalanced data can lead to some nasty surprises. There are ways to deal with unbalanced data by that Is for later.
Data diversity is another important aspect of collecting data. A computer can only learn diversity if it is fed with a diverse array of images. If you only use one subject then when the computer is asked to recognise a similar object it fail because it has not learned that there are other objects that are the same classification because there is diversity in the set of samples.
Conclusion
Supervised Machine learning can be extremely accurate when classifying data, the extent of the accuracy is dependent on the human level of accuracy during input of the data. The better the labels and more diverse the images the better the model performs.
We are left with one question, is there such a thing as unsupervised learning?
Resources and further reading
https://towardsdatascience.com/machine-learning-from-first-principles-51a5e75a3c47#0f96 (Connor Brenton)
https://en.wikipedia.org/wiki/Supervised_learning
https://towardsdatascience.com/a-brief-introduction-to-supervised-learning-54a3e3932590 A brief Introduction to Supervised Learning — Aidan Wilson Sept 2019
https://deepai.org/machine-learning-glossary-and-terms/supervised-learning — DeepAI
https://www.geeksforgeeks.org/supervised-unsupervised-learning/ GeeksfroGeeks (Supervised and Unsupervised learning)