FerretDude t1_izoa26g wrote on December 10, 2022 at 4:36 PM

Reply to comment by cfoster0 in [R] Illustrating Reinforcement Learning from Human Feedback (RLHF) by robotphilanthropist

It's already being used in production with a number of our partners. We have some chonky models coming out really soon. Expect things well into the tens of billions in the coming months.

cfoster0 t1_izrdeii wrote on December 11, 2022 at 7:07 AM

Who? Who's even using RLHF in production yet, besides OpenAI (and maybe Cohere)?

FerretDude t1_izs8wj1 wrote on December 11, 2022 at 1:49 PM

Not allowed to share, many groups are looking into using RLHF in production though

cfoster0 t1_izuxn52 wrote on December 12, 2022 at 1:00 AM

Did y'all stop doing work out in the open? That's a shame. End of an era, I guess.

FerretDude t1_izyu3ka wrote on December 12, 2022 at 9:17 PM

RLHF is a bit tricky because you have to either work with data vendors or groups that have access to feedback data. Eventually we'll rely more on crowd sourcing I think.