koolaidman123 t1_j6x2b05 wrote
Reply to comment by [deleted] in [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta
sure? you can have multiple ways of ranking, but:
- the instructGPT paper strictly uses pairwise ranking
- asking annotators to rank however many passages 1-k in 1 shot is much more difficult and subject to noise than asking for pairwise comparisons
Viewing a single comment thread. View all comments