stjernen

stjernen t1_iw20n2s wrote

Kind of stupid but; Im having a hard time understanding reward and how to apply them.

  • Is reward a input?
  • Is reward the process of constant retraining?
  • Is reward the process of labeling?
  • Can it only be used with mdp?
  • Can it only be used in ql / dql?
  • I dont use cnn and images, can it be done without?
  • Lots of examples out there using «gym», can you do it without?
  • Many examples use -100 to 100 as reward, should it not be -1 to 1?

Cant really wrap my head around it. Currently making a card playing nn, with success in using feature and labeling. Want to take the next step into maybe dql.

1