Viewing a single comment thread. View all comments

meyerhot t1_j1cfyrj wrote

I am really interested in this and have been looking into doing some sort of finetuning on an LLM like GLM or Bloom. I had this idea for human in the loop in grad school but wasn’t able to implement how to assign the rewards to the sentences when the text generation is token by token.

1