Viewing a single comment thread. View all comments

--algo t1_j40j84w wrote

We are both right and wrong. To be pedantic, it's this paper for both https://arxiv.org/abs/2203.02155 but with different training data

1

Hyper1on t1_j43crwx wrote

That's the InstructGPT paper, which is right for ChatGPT, but Copilot is based on Codex, which does not use RLHF.

3