One of the biggest AI discoveries over the past year has been the importance of human feedback for building next-gen LLMs — but I still see a lot of confusion around how RLHF works at a fundamental level.

I wrote a blog to get into the details here: https://www.surgehq.ai/blog/introduction-to-reinforcement-learning-with-human-feedback-rlhf-series-part-1

Comments

You must log in or register to comment.

CLLBJ16 t1_j4469n2 wrote on January 13, 2023 at 1:14 AM

Do you know the application of RLHF in the field of structured data (tabular data)? I have been learning about the work related to RLHF recently but found that most of the work is in the field of NLP and CV, so far I haven't found what I want.

sabertoothedhedgehog t1_j468gio wrote on January 13, 2023 at 1:29 PM

Very easy-to-understand, well-written summary. Many thanks!

RebornHugo t1_j473chu wrote on January 13, 2023 at 4:57 PM

Pretty useful blog. Thank you Edwin.

V1r3s1nnumr1s t1_j47bp7s wrote on January 13, 2023 at 5:48 PM

How do you become a labeller for surge ? I have some background in maths, my native tongue is French, but I speak English and I would be interested in working for surge.

LetGoAndBeReal t1_j48gyhr wrote on January 13, 2023 at 10:03 PM

My main comment is that is article was super useful and easy to understand.

My smaller comment is that the pattern of repeating the content is those bordered areas interrupts the flow and is pretty annoying. So, my vote would be to drop that, and you have yourself a near perfect article.

ureepamuree t1_j4pm1u2 wrote on January 17, 2023 at 11:10 AM

Is it okay to call RLHF as a synonym to Active Learning in RL?