Submitted by FerretDude t3_y8y8cm in MachineLearning
Hey all!
My name is Louis Castricato. I lead CarperAI, a large FOSS group that recently released a library for doing distributed RLHF.
We just announced a project today during Scale's TransformX conference to reimplement Instruct GPT, make all the datasets available as MIT, and release our checkpoints/models.
I'm super interested in the democratization of large scale RLHF, as I feel it's a relatively unexplored space in the open source community.
To that end, we'd love to get the subreddit and community more involved in our task selection process for our instruct model. We'll be hosting a panel on this in a few weeks, so I'm curious r/machinelearning, what kinds of tasks would you love to see an instruct model tuned on if you had infinite resources?
Here is our instruct announcement: https://carper.ai/instruct-gpt-announcement/ And a link to our discussion panel on the CarperAI discord: https://discord.gg/cCR3xEAt?event=1029746950305751141
Excited to hear your thoughts!
visarga t1_it323xj wrote
I'd like to see information extraction from semi structured documents like receipts, invoices, forms, contracts, screen shots (apps), etc. The format - question answering, you prompt with a document transcribed in text and a question, get the value in return.