[deleted] t1_j6x0948 wrote
Reply to comment by koolaidman123 in [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta
[deleted]
koolaidman123 t1_j6x2b05 wrote
sure? you can have multiple ways of ranking, but:
- the instructGPT paper strictly uses pairwise ranking
- asking annotators to rank however many passages 1-k in 1 shot is much more difficult and subject to noise than asking for pairwise comparisons
Viewing a single comment thread. View all comments