Viewing a single comment thread. View all comments

[deleted] t1_j6x0948 wrote on February 2, 2023 at 2:23 PM

Reply to comment by koolaidman123 in [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta

[deleted]

koolaidman123 t1_j6x2b05 wrote on February 2, 2023 at 2:37 PM

sure? you can have multiple ways of ranking, but:

the instructGPT paper strictly uses pairwise ranking
asking annotators to rank however many passages 1-k in 1 shot is much more difficult and subject to noise than asking for pairwise comparisons