How does Orthogonalization relate to Machine Learning?

3 min readJan 21, 2019

For a supervised learning system to do well, you usually need to tune the knobs of your system to make sure that four things hold true.

First, is that you usually have to make sure that you’re doing well on the training set. So performance on the training set needs to pass some threshold. For some applications, this might mean doing comparably to human level performance. But this will depend on your application and the dataset you’re using.

Secondly after doing well on the training sets, you then hope that this leads to also doing well on the dev set or development set. And you then hope that this also does well on the test set. And finally, you hope that doing well on the test set and the cost function results in your system performing in the real world. So you hope that your model predicts accurately the dog picture in the app given by the users. So to relate back to the radio tuning example, if the sound of your radio was too noisy, you wanted one knob to tune in order to adjust that to the desired frequency of the radio channel. You don’t want to have to carefully adjust five different knobs, which also affect different things. You just want one knob to affect the frequency of your the channel. So in a similar way, if your algorithm is not fitting the training set well on the cost function, you want one specific set of knobs that you can use, to make sure you can tune your algorithm to make it fit well on the training set.

So the knobs you use to tune this are,

1. Fit training set well in cost function
-If it doesn’t fit well, the use of a bigger neural network or switching to a better optimization algorithm might help.
2. Fit development set well on cost function
-If it doesn’t fit well, regularization or using bigger training set might help.
3.Fit test set well on cost function
-If it doesn’t fit well,the use of a bigger development set might help
4. Performs well in real world
-If it doesn’t perform well, the development test set is not set correctly or the cost function is not evaluating the right thing.

In contrast, if you find that your algorithm is not fitting the dev set well, then there are separate set of knobs that can be used to fine tune. So for example, if your algorithm is not doing well on the dev set and it’s doing well on the training set, then you have a set of knobs around regularization that you can use to try to make it satisfy the second criteria.

So the analogy is, now that you’ve tuned the frequency of your radio set, if the volume of the radio isn’t quite right, then you want a different knob in order to tune the volume of the radio. And you want to do this hopefully without affecting the frequency of your radio station.

Getting a bigger training set would be another knob you could use, that helps your learning algorithm generalize better to the dev set. Now, having adjusted the frequency and volume of your radio, well, what if it doesn’t meet the third criteria? What if you do well on the dev set but not on the test set? If that happens, then the knob to tune is to probably to get a bigger dev set, Because if it does well on the dev set but not the test set, it probably means you’ve over-tuned to your dev set, and you need to go back and find a bigger dev set.

And finally, if it does well on the test set, but it isn’t delivering to you a happy dog picture app user, then what that means is that you want to go back and change either the dev set or the cost function.

That’s all guys — if you’ve made it this far, comment below.

How does Orthogonalization relate to Machine Learning?

Written by Rajath Bharadwaj