Notes for “A Few Useful Things to Know about Machine Learning”

Z.
2 min readAug 15, 2018

--

P. Domingos. A few useful things to know about machine learning. Communications of the ACM, 55(10):78–87, 2012.

This paper focuses on “the most mature and widely used” machine learning type: classification.

  1. learning = representation + evaluation + optimization

hypothesis space of the learner: the set of classifiers that it can possibly learn

An evaluation function is needed to distinguish good classifiers from bad ones

need efficient method to find the optimum

2. It’s generalization that counts

goal: to generalize beyond the examples in the training set

separate the training set and the test set

cross-validation

generalization => don’t have access to the function we want to optimize => we may not need to fully optimize it [a local optimum may be better than the global optimum]

3. Data alone is not enough

No free lunch theorem: “state[s] that any two optimization algorithms are equivalent when their performance is averaged across all possible problems”.

one of the key criteria for choosing a representation is which kinds of knowledge are easily expressed in it.

“Farmers combine seeds with nutrients to grow crops. Learners combine knowledge with data to grow programs.”

4. Overfitting has many faces

decomposing generalization error into bias and variance

a linear learner has high bias

decision tree don’t have high bias, but it suffers from high variance

To combat overfitting:

1> cross-validation

2> adding a regularization term to the evaluation function [penalize classifiers with more structure]

3> perform a statistical significance test like chi-sqaure before adding new structure [particularly useful when data is very scarce]

“It’s easy to avoid overfitting (variance) by falling into the op- posite error of underfitting (bias).”

common misconception: overfitting is caused by noise.

The problem of multiple testing is closely related to overfitting.

This problem can be combatted by correcting the significance tests to take the number of hypotheses into account, but this can lead to underfitting.

A better approach is to control the fraction of falsely accepted non-null hypotheses, known as the false discovery rate.

TBD

--

--

Z.
0 Followers

This world is beautiful. But I don’t deserve it.