disastorm

disastorm t1_je8lm7w wrote

I have a question about reinforcement learning, or more specifically gym-retro ( i know gym is pretty old now I guess ).

In the case of gym-retro, if you give a reward to the AI, are they actually looking at a set of variables and saying like "oh I pressed this button while all of these variables were these values and got this reward, so I should press it when all these variables are similar" or are they just saying like "oh I pressed this button and got this reward, so I should press it more often"?

1

disastorm t1_jd66swu wrote

Reply to comment by trnka in [D] Simple Questions Thread by AutoModerator

I see thanks, is that basically the equivallent of having "top_k" = 1?

Can you explain what these mean. From what I understand top_k means it considers the top K number of possible words at each step.

I can't exactly understand what top_p means, can they be use together?

1

disastorm t1_jcwyjyv wrote

I noticed that "text-generation" models have variable output but alot of other models like chatbots and other models often give the exact same response for the same input prompt. Is there a reason for this, or perhaps is there a setting that would allow a chatbot for example to have variable responses, or is my understanding of this just wrong?

1