Submitted by BB4evaTB12 t3_10a7qmi in MachineLearning

One of the biggest AI discoveries over the past year has been the importance of human feedback for building next-gen LLMs — but I still see a lot of confusion around how RLHF works at a fundamental level.

I wrote a blog to get into the details here: https://www.surgehq.ai/blog/introduction-to-reinforcement-learning-with-human-feedback-rlhf-series-part-1

14

Comments

You must log in or register to comment.

CLLBJ16 t1_j4469n2 wrote

Do you know the application of RLHF in the field of structured data (tabular data)? I have been learning about the work related to RLHF recently but found that most of the work is in the field of NLP and CV, so far I haven't found what I want.

1

RebornHugo t1_j473chu wrote

Pretty useful blog. Thank you Edwin.

1

V1r3s1nnumr1s t1_j47bp7s wrote

How do you become a labeller for surge ? I have some background in maths, my native tongue is French, but I speak English and I would be interested in working for surge.

1

LetGoAndBeReal t1_j48gyhr wrote

My main comment is that is article was super useful and easy to understand.

My smaller comment is that the pattern of repeating the content is those bordered areas interrupts the flow and is pretty annoying. So, my vote would be to drop that, and you have yourself a near perfect article.

1

ureepamuree t1_j4pm1u2 wrote

Is it okay to call RLHF as a synonym to Active Learning in RL?

1