At A few useful things to know about machine learning, Pedro Domingos stated that developing successful machine learning algorithms require substantial amount of “black art” which is difficult to find in textbooks. Which means that, one needs to learn witchcraft or develop some intuition to design the best algorithms. Below, I summarize the pitfalls and how to avoid them, as they were mentioned by Pedro Domingos.
1. Learning = Representation + Evaluation + Optimization
There are 3 major components of machine learning. Those are; 1- choosing the right classification/regression algorithm, 2- choosing the right cost function, 3- choosing the right optimizer. Combination of these three selections determines the performance on test data.
2. It is generalization that counts
Generalization is the ultimate goal of designing a machine learning algorithm. Importance of cross-validation cannot be overemphasised.
3. Data alone is not enough
Feeding raw data to the algorithm might not be a good approach. The programmer should have some pre-knowledge about data, so some knowledge for better representation can be applied. Pre-knowledge and assumptions would be helpful.
4. Overfitting has many faces
One should keep in mind that overfitting measure tells about the variance but bias. The overfitting might indicate lack of noise in the training data, however that might not be the specific reason. It might be a good idea to calculate the false discovery rate.’
5. Intuition fails in high dimensions
It is obvious fact that we cannot visualize a data classification/regression in our minds when the data is more then 3 dimensional. It would be good idea to look at dimensionality reduction algorithms when it is possible.
6. Theoretical guarantees are not what they seem
The algorithm performance is used for practical reasons. Even 100% test performance doesn’t guarantee the performance on new data. We haven’t trained the algorithm with infinite number of examples. Even then it might not be possible to design the perfect algorithm mathematically.
7. Feature engineering is the key
The biggest effort+time should be put into preparing the data with a good feature engineering algorithm. That is more important than designing a good machine learning algorithm. (Feature engineering was mentioned at our School of AI lecture.)
8. More data beats a cleverer algorithm
Ironic but true. More data makes the algorithm cleverer, not the design of the algorithm. Probably, that is because of that more data helps to generalization.
9. Learn many models, not just one
Truth is that you cannot choose the best machine learning algorithm without knowing all possibilities. (There are algorithms which help to choose the best algorithm. Check the adaboost and adaNet the School of AI lecture on this link. )
10. Simplicity doesn’t imply accuracy
Number of parameters to set (or number of layers in a neural network) doesn’t have any direct connection with the accuracy of the results. One shouldn’t assume that adding more specific parameters (or adding more layers) would increase the accuracy.
11. Representable doesn’t imply learnable
Even a very complex data distribution can be somehow represented doesn’t mean that the distribution can be learned.
12. Correlation doesn not imply causation
This fact comes from probability theory. Correlation and covariance (causation) are not the same. (For learning more about probability theory, I would recommend watching my Optimal Estimation in Dynamic Systems lecture series.)
For learning more, follow my free resources;
- The monthly newsletter
- YouTube channel of lectures
- The School of AI blog
- AI courses on demand
- Follow on Twitter
Image sources (except Beril’s image with Siraj): Pixabay