firejak308 t1_ja16y0h wrote on February 26, 2023 at 1:56 AM

My main concern with this is how the "Reply as Assistant" texts are generated. That task is orders of magnitude more difficult than labeling an existing reply/prompt or coming up with a new prompt, because it often requires doing background research about the question and summarizing it effectively. If I were to actually try to fill out one of the Reply as Assistant tasks, I would much rather just copy-paste the Google Knowledge Panel or the Wikipedia summary or the ChatGPT output. How do we know that people aren't doing those kinds of things, which could introduce plagiarism concerns?

coconautico OP t1_ja1gd4g wrote on February 26, 2023 at 3:12 AM

Indeed! Many of them are just copying and pasting answers out of laziness or because they don't know they're not supposed to. But you know what? That's okay! It doesn't matter. And it's all thanks to the magic of large-scale ranking! Let me explain.

If we had a LLM that just "reads" text indiscriminately, we would end up with a model that could hardly be better than the average human (...as the average human is just, the average). However, the moment we have multiple answers per question, and hundreds of people upvoting/downvoting, and ranking them relatively according to their quality (...and a few moderators like on reddit), we end up with a set of fairly high-quality question-answer pairs that are better than the average human answer, in the same way that a set of weak classifiers can result in a strong classifier (i.e. AdaBoost).