Kind of stupid but;
Im having a hard time understanding reward and how to apply them.
Is reward a input?
Is reward the process of constant retraining?
Is reward the process of labeling?
Can it only be used with mdp?
Can it only be used in ql / dql?
I dont use cnn and images, can it be done without?
Lots of examples out there using «gym», can you do it without?
Many examples use -100 to 100 as reward, should it not be -1 to 1?
Cant really wrap my head around it.
Currently making a card playing nn, with success in using feature and labeling. Want to take the next step into maybe dql.
stjernen t1_iw20n2s wrote
Reply to [D] Simple Questions Thread by AutoModerator
Kind of stupid but; Im having a hard time understanding reward and how to apply them.
Cant really wrap my head around it. Currently making a card playing nn, with success in using feature and labeling. Want to take the next step into maybe dql.