JoeHenzi t1_j50pbv9 wrote on January 19, 2023 at 4:19 PM

Reply to comment by JClub in [R] A simple explanation of Reinforcement Learning from Human Feedback (RLHF) by JClub

I'll take a look, thanks again. Building up a dataset, at the very least, that could be interesting to analyze or crunch. Would love to implement a GA to explore the space and have the example code from ChatGPT but need to dive deeper. As I may have mentioned on my GH comment, when trying to do predictions around parameters I end up blocking/slowing the API call so either my code is trash (likely!) or I'm trying to do too-too much at once.

On my short term list is using a T5-like model to produce summaries but I was trying to execute them at bad times, trying to make too many changes at once.

Thanks again for sharing. Enjoying playing in the space and love when you find people willing to share. (Unlike OpenAI who is slowly closing out the world to their toys).

JoeHenzi t1_j4yowtu wrote on January 19, 2023 at 4:25 AM

Reply to [R] A simple explanation of Reinforcement Learning from Human Feedback (RLHF) by JClub

Taking a look - wanting to implement this in my application to explore parameter space, shoot for optimal, but actually am finding ChatGPT gets very cagey on the topic lately. Explored the topic of Genetic Algorithms, which it suggested would be less computationally expensive, then decided to not help me really get to coding it.

EDIT: This is exactly my use case...