Viewing a single comment thread. View all comments

cnapun t1_j0z2van wrote

User behavior is pretty stochastic and not really well-captured in datasets available to academia. There's also the second class of papers that explores ranking more than candidate generation, which imo are usually more interesting, but also harder to find good data for in academia.

I take all results in papers discussing embeddings/two tower models (for retrieval) with a grain of salt because in my experience, the number one thing that matters for these in practice is negative sampling (but people rarely do ablations on this. see this paper that shows how metric learning hasn't really progressed as much as papers would have you think). They can still be good to read for ideas though

18

import_social-wit t1_j0zqthl wrote

Negative sampling is getting more traction in ranking the past few years as some papers are solely discussing the impact of sampling methods.

3

hawkxor t1_j103k20 wrote

This is an area of interest for me, do you have a good / recent paper rec on that subject?

1

cnapun t1_j10a9jz wrote

In my experience, negative sampling is super application-dependent (esp for retrieval) sadly. FB had a paper discussing how they train a search retrieval model (no hard negatives), while amazon used hard negatives combined with easy negatives in product search (fb paper mentioned they tried this but it didn't help, but did some other stuff). Both of them use hinge loss, but other places use softmax more often. I'm a fan of random negatives (and distance weighted sampling), but eventually we found that mixed negatives + softmax with sample probability correction work a little better for a lot of cases.

One of the big challenges is that there are so many possible hyperparams here: do you concatenate negatives or sum losses, how many in-batch negatives do you use, if you have things that are from a different distribution that positives, can you use them as in-batch negatives, what's the ratio of in-batch to random negatives. And depending on the application, different configurations here can yield better or worse results.

Some not super-recent papers I can think of:

https://research.google/pubs/pub50257/

https://arxiv.org/abs/1706.07567

https://arxiv.org/abs/2010.14395

https://arxiv.org/abs/1907.00937 (3.2)

https://arxiv.org/abs/2006.11632 (2.2/2.4,6.1)

5

import_social-wit t1_j106awh wrote

Sure, do you want something specific about the sampling stuff or ranking in general?

1

hawkxor t1_j1096ot wrote

Ha, I'll take both if you have favorites! But I was asking about the sampling stuff.

1