stjernen t1_iw20n2s wrote on November 12, 2022 at 9:37 AM

Reply to [D] Simple Questions Thread by AutoModerator

Kind of stupid but; Im having a hard time understanding reward and how to apply them.

Is reward a input?
Is reward the process of constant retraining?
Is reward the process of labeling?
Can it only be used with mdp?
Can it only be used in ql / dql?
I dont use cnn and images, can it be done without?
Lots of examples out there using «gym», can you do it without?
Many examples use -100 to 100 as reward, should it not be -1 to 1?

Cant really wrap my head around it. Currently making a card playing nn, with success in using feature and labeling. Want to take the next step into maybe dql.