waa007 t1_ix7yoy7 wrote on November 21, 2022 at 1:16 PM

Very Good point!

~~It seems like that apply GAN(Generative adversarial network) in NLP,~~ The main problem is that how to judge how much reward or penalty should to given extreme accurate,

blazejd OP t1_ix8il0p wrote on November 21, 2022 at 3:52 PM

Can you rephrase the last part of your second sentence? Don't quite get what you mean.

koiRitwikHai t1_ix95poh wrote on November 21, 2022 at 6:27 PM

It meant same as "how will you define an objective function?"

waa007 t1_ixayg3f wrote on November 22, 2022 at 2:07 AM

In general RL, the environment will get a accurate reward after the agent have a step, In NLP, It's hard to give a accurate reward except that there is a really person to teach the agent.

So I think how to give a accurate reward is the main problem.

I'm sorry that it has so little contact with GAN.