Submitted by Awekonti t3_zqitxl in MachineLearning
cnapun t1_j10a9jz wrote
Reply to comment by hawkxor in [D] Deep Learning based Recommendation Systems by Awekonti
In my experience, negative sampling is super application-dependent (esp for retrieval) sadly. FB had a paper discussing how they train a search retrieval model (no hard negatives), while amazon used hard negatives combined with easy negatives in product search (fb paper mentioned they tried this but it didn't help, but did some other stuff). Both of them use hinge loss, but other places use softmax more often. I'm a fan of random negatives (and distance weighted sampling), but eventually we found that mixed negatives + softmax with sample probability correction work a little better for a lot of cases.
One of the big challenges is that there are so many possible hyperparams here: do you concatenate negatives or sum losses, how many in-batch negatives do you use, if you have things that are from a different distribution that positives, can you use them as in-batch negatives, what's the ratio of in-batch to random negatives. And depending on the application, different configurations here can yield better or worse results.
Some not super-recent papers I can think of:
https://research.google/pubs/pub50257/
https://arxiv.org/abs/1706.07567
https://arxiv.org/abs/2010.14395
https://arxiv.org/abs/1907.00937 (3.2)
https://arxiv.org/abs/2006.11632 (2.2/2.4,6.1)
hawkxor t1_j10ind9 wrote
Thank you!
Viewing a single comment thread. View all comments