meyerhot

meyerhot t1_j1cg6jj wrote on December 23, 2022 at 7:12 AM

Reply to comment by londons_explorer in [D] When chatGPT stops being free: Run SOTA LLM in cloud by _underlines_

Anyone have any ideas about how they assigned rewards? Somehow take the sum of the prob(logits) from each token in the sentence and multiply that by the reward?

meyerhot t1_j1cfyrj wrote on December 23, 2022 at 7:09 AM

Reply to [D] When chatGPT stops being free: Run SOTA LLM in cloud by _underlines_

I am really interested in this and have been looking into doing some sort of finetuning on an LLM like GLM or Bloom. I had this idea for human in the loop in grad school but wasn’t able to implement how to assign the rewards to the sentences when the text generation is token by token.