Thank you Aamir!
- Unsupervised methods can be tackled in the same way once you have defined a good way to judge their performance. If you are clustering items for example, how will you judge whether your clustering works from a product standpoint? Once you are there, the rest flows nicely.
- I personally am gradually moving away from Notebooks for things that are not prototyping, as I have found it easier to iterate, test, and deploy using other tools.
- In general, yes. How you decide what goes in each split is crucial as we describe in the article. The actual percentage breakdown changes based on how much data you have.
- No matter how complex the model is, the answer is often in the data. Look at your input data, your pre-processed data, your post-processed data, your post-processed labels, etc…
- I would look at errors to see which dataset have limiting factors. In most cases, runtime per example does not vary between train and test.