Viewing a single comment thread. View all comments

meyerhot t1_j1cg6jj wrote on December 23, 2022 at 7:12 AM

Reply to comment by londons_explorer in [D] When chatGPT stops being free: Run SOTA LLM in cloud by _underlines_

Anyone have any ideas about how they assigned rewards? Somehow take the sum of the prob(logits) from each token in the sentence and multiply that by the reward?