Daniil Korbut
Jul 10, 2017 · 1 min read

It’s not a good idea because we will have very disbalanced classes. Imagine, we have 100k products in our online shop, and a user has bought only one product. If we add other 99999 products as negative samples, we’ll make our task very disbalanced, and even good algorithms can have problems with it.

There are several ways to solve this problem.

1) The easiest, but the least effective way is to add random objects as negative samples from uniform distribution among all objects or with probabilities proportionally their popularity.

2) If we use an algorithm 1, we can add as negative samples some recommendations from a not complicated algorithm 2. It helps to avoid correlations and overfitting and works really good.

    Daniil Korbut

    Written by

    Deep Learning Researcher at Insilico Medicine