Viewing a single comment thread. View all comments

koolaidman123 t1_j6x2b05 wrote on February 2, 2023 at 2:37 PM

Reply to comment by [deleted] in [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta

sure? you can have multiple ways of ranking, but:

the instructGPT paper strictly uses pairwise ranking
asking annotators to rank however many passages 1-k in 1 shot is much more difficult and subject to noise than asking for pairwise comparisons