Viewing a single comment thread. View all comments

blazejd OP t1_ix7kazf wrote on November 21, 2022 at 10:21 AM

Reply to comment by Kylaran in [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd

Glad to hear a non-ML perspective on it! Initializing with language models and then using RL for feedback makes a lot of sense. Could you share any particular papers that I could look into?