waa007
waa007 t1_ixayg3f wrote
Reply to comment by blazejd in [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd
In general RL, the environment will get a accurate reward after the agent have a step, In NLP, It's hard to give a accurate reward except that there is a really person to teach the agent.
So I think how to give a accurate reward is the main problem.
I'm sorry that it has so little contact with GAN.
waa007 t1_ix7yoy7 wrote
Reply to [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd
Very Good point!
It seems like that apply GAN(Generative adversarial network) in NLP, The main problem is that how to judge how much reward or penalty should to given extreme accurate,
waa007 t1_ix7s5k7 wrote
Reply to comment by Cheap_Meeting in [R] Tips on training Transformers by parabellum630
Maybe, There is too little data and model overfit, mode parameter got stuck in locally optimal result, Is it possible?
waa007 t1_ix7rfte wrote
- Read the title, abstracts, figures, experiments
- Go through conclusion, Figures and skip the rest
- Read the rest but skip the math
- Read whole but skip the parts that don't make sense
Tips from Andrew Ng.
EDIT: source video class from Andrew Ng
waa007 t1_iu3x6bn wrote
Of course, it’s depends on applying situation
waa007 t1_ir8zhmo wrote
It’s coming
waa007 t1_ixbmst4 wrote
Reply to comment by JackandFred in [R][D] Reading ML Papers - Workflow/Advice by EndlessRevision
Yes, it's from Andrew Ng, I add source video link above.