Submitted by robotphilanthropist t3_zh2u3k in MachineLearning
New HuggingFace blog post on RLHF: https://huggingface.co/blog/rlhf
Motivated by ChatGPT and the lack of conceptually focused resources on the topic.
Submitted by robotphilanthropist t3_zh2u3k in MachineLearning
New HuggingFace blog post on RLHF: https://huggingface.co/blog/rlhf
Motivated by ChatGPT and the lack of conceptually focused resources on the topic.
This is a really nice write up, thank you.
I'm interested what your thoughts are on prompt manipulation and "reasoning" your way around ChatGPT's ethical responses (and how those responses were even added during training). What direction do you see being best to combat these issues?
Also, have you looked at incorporating querying external sources for information by decomposing problems to reason about them? The quality of ChatGPT made me think of Binder https://lm-code-binder.github.io/ and how powerful a combination they could be. A benefit of Binder is the chain of reasoning is encoded in the intermediate steps and queries which can be debugged and audited.
Something ChatGPT lacks is that ability to properly explain itself. You can ask it to explain it's last output, but you can also ask it to lie to you and it does.
If you ask it to lie to you convincingly, who is to say it isn't?
Can a conversationally trained LLM ever be used in a production application (as many are beginning to do) without a more rigorous rule based framework around it?
I’ve been wondering, why/how is it better to train a reward model on human preferences and do RL then just doing supervised fine tuning on that human data? Is there an intuition, empirical finding, logistical reason?
Are there any plans to reproduce WebGPT as part of the InstructGPT reproduction seeing as ChatGPT appears to already have or will be receiving such functionality soon?
About this bit
> At the moment, TRLX has an API capable of production-ready RLHF at the scales required for LLM deployment (e.g. 33 billion parameters). Future versions of TRLX will allow for language models up to 200B parameters. As such, interfacing with TRLX is optimized for machine learning engineers with experience at this scale.
Has TRLX been used to tune models in production already? Or if not, what did the blog post mean by "capable of production-ready RLHF"? I haven't seen any RLHF-ed models built on open source software yet, much less a 33B parameter one.
EDIT: Also hi @FerretDude
It's already being used in production with a number of our partners. We have some chonky models coming out really soon. Expect things well into the tens of billions in the coming months.
Who? Who's even using RLHF in production yet, besides OpenAI (and maybe Cohere)?
Not allowed to share, many groups are looking into using RLHF in production though
Did y'all stop doing work out in the open? That's a shame. End of an era, I guess.
RLHF is a bit tricky because you have to either work with data vendors or groups that have access to feedback data. Eventually we'll rely more on crowd sourcing I think.
Nit: Elo is a name, not an acronym
Really insightful, thanks.
FerretDude t1_izka011 wrote
Team lead at Carper happy to answer questions