New HuggingFace blog post on RLHF: https://huggingface.co/blog/rlhf

Motivated by ChatGPT and the lack of conceptually focused resources on the topic.

Comments

You must log in or register to comment.

FerretDude t1_izka011 wrote on December 9, 2022 at 6:50 PM

Team lead at Carper happy to answer questions

ReginaldIII t1_izl0quh wrote on December 9, 2022 at 9:46 PM

This is a really nice write up, thank you.

I'm interested what your thoughts are on prompt manipulation and "reasoning" your way around ChatGPT's ethical responses (and how those responses were even added during training). What direction do you see being best to combat these issues?

Also, have you looked at incorporating querying external sources for information by decomposing problems to reason about them? The quality of ChatGPT made me think of Binder https://lm-code-binder.github.io/ and how powerful a combination they could be. A benefit of Binder is the chain of reasoning is encoded in the intermediate steps and queries which can be debugged and audited.

Something ChatGPT lacks is that ability to properly explain itself. You can ask it to explain it's last output, but you can also ask it to lie to you and it does.

If you ask it to lie to you convincingly, who is to say it isn't?

Can a conversationally trained LLM ever be used in a production application (as many are beginning to do) without a more rigorous rule based framework around it?

bigblueboo t1_iznegx5 wrote on December 10, 2022 at 11:58 AM

I’ve been wondering, why/how is it better to train a reward model on human preferences and do RL then just doing supervised fine tuning on that human data? Is there an intuition, empirical finding, logistical reason?

zaptrem t1_izn4krn wrote on December 10, 2022 at 9:38 AM

Are there any plans to reproduce WebGPT as part of the InstructGPT reproduction seeing as ChatGPT appears to already have or will be receiving such functionality soon?

cfoster0 t1_izlys6v wrote on December 10, 2022 at 2:01 AM

About this bit

> At the moment, TRLX has an API capable of production-ready RLHF at the scales required for LLM deployment (e.g. 33 billion parameters). Future versions of TRLX will allow for language models up to 200B parameters. As such, interfacing with TRLX is optimized for machine learning engineers with experience at this scale.

Has TRLX been used to tune models in production already? Or if not, what did the blog post mean by "capable of production-ready RLHF"? I haven't seen any RLHF-ed models built on open source software yet, much less a 33B parameter one.

EDIT: Also hi @FerretDude

FerretDude t1_izoa26g wrote on December 10, 2022 at 4:36 PM

It's already being used in production with a number of our partners. We have some chonky models coming out really soon. Expect things well into the tens of billions in the coming months.

cfoster0 t1_izrdeii wrote on December 11, 2022 at 7:07 AM

Who? Who's even using RLHF in production yet, besides OpenAI (and maybe Cohere)?

FerretDude t1_izs8wj1 wrote on December 11, 2022 at 1:49 PM

Not allowed to share, many groups are looking into using RLHF in production though

cfoster0 t1_izuxn52 wrote on December 12, 2022 at 1:00 AM

Did y'all stop doing work out in the open? That's a shame. End of an era, I guess.

FerretDude t1_izyu3ka wrote on December 12, 2022 at 9:17 PM

RLHF is a bit tricky because you have to either work with data vendors or groups that have access to feedback data. Eventually we'll rely more on crowd sourcing I think.

Operation_Ivy t1_izma0z2 wrote on December 10, 2022 at 3:36 AM

Nit: Elo is a name, not an acronym

altered-bot t1_izo2xqd wrote on December 10, 2022 at 3:45 PM

Really insightful, thanks.