Jul 10, 2017 · 1 min read
It’s not a good idea because we will have very disbalanced classes. Imagine, we have 100k products in our online shop, and a user has bought only one product. If we add other 99999 products as negative samples, we’ll make our task very disbalanced, and even good algorithms can have problems with it.
There are several ways to solve this problem.
1) The easiest, but the least effective way is to add random objects as negative samples from uniform distribution among all objects or with probabilities proportionally their popularity.
2) If we use an algorithm 1, we can add as negative samples some recommendations from a not complicated algorithm 2. It helps to avoid correlations and overfitting and works really good.
