Hey all!

My name is Louis Castricato. I lead CarperAI, a large FOSS group that recently released a library for doing distributed RLHF.

We just announced a project today during Scale's TransformX conference to reimplement Instruct GPT, make all the datasets available as MIT, and release our checkpoints/models.

I'm super interested in the democratization of large scale RLHF, as I feel it's a relatively unexplored space in the open source community.

To that end, we'd love to get the subreddit and community more involved in our task selection process for our instruct model. We'll be hosting a panel on this in a few weeks, so I'm curious r/machinelearning, what kinds of tasks would you love to see an instruct model tuned on if you had infinite resources?

Here is our instruct announcement: https://carper.ai/instruct-gpt-announcement/ And a link to our discussion panel on the CarperAI discord: https://discord.gg/cCR3xEAt?event=1029746950305751141

Excited to hear your thoughts!

Comments

You must log in or register to comment.

visarga t1_it323xj wrote on October 20, 2022 at 4:00 PM

#154,522

I'd like to see information extraction from semi structured documents like receipts, invoices, forms, contracts, screen shots (apps), etc. The format - question answering, you prompt with a document transcribed in text and a question, get the value in return.

FerretDude OP t1_it363tv wrote on October 20, 2022 at 4:26 PM

#154,773

Replying to visarga (#154,522)

Yeah I think a more general format for information extraction could potentially be useful

DigThatData t1_it43hfu wrote on October 20, 2022 at 8:01 PM

#157,107

> FerretDude

sus.

ivalm t1_it6c261 wrote on October 21, 2022 at 6:49 AM

#162,636

Task oriented dialogues following a description of the task.

From medical domain example:

————

Ask what symptoms the patient is feeling. For each symptom ascertain symptom duration, if it is worsening or improving, if there alleviating or aggravating factors. For symptoms with uncertain location assert symptom location.

Doctor: Hi, what brings you in here today?

Patient: I’ve been having a sore throat and pain in my face for the past week.

Doctor: [start generation]

—————

In particular such tasks (after multiple dialogue turns) are hard because one needs to be coherent through the conversation. This is something eg davinci-2 is much better than davinci.

FerretDude OP t1_it6yum2 wrote on October 21, 2022 at 11:49 AM

#163,934

Replying to ivalm (#162,636)

Ohhh that's a great idea !

Ok-Zombie2406 t1_it7vw7b wrote on October 21, 2022 at 3:58 PM

#166,883

Is there any way to find bias with RLHF models?

FerretDude OP t1_it8sw9v wrote on October 21, 2022 at 7:38 PM

#169,798

Replying to Ok-Zombie2406 (#166,883)

Don't think I understand the question...

Ok-Zombie2406 t1_itb3k3l wrote on October 22, 2022 at 7:49 AM

#175,914

Replying to FerretDude (#169,798)

how to measure the human induced bias in RLHF models?