Submitted by SaltyStackSmasher t3_11euzja in MachineLearning
Emergency_Apricot_77 t1_jah9rb7 wrote
Reply to comment by Kaleidophon in [D] backprop through beam sampling ? by SaltyStackSmasher
Why go with BLEU though ? OP didn't particularly mention optimizing sequence level metrics. Can't we still use cross entropy ? Something as follows:
Sample first token, calculate cross-entropy with first token of gold
Sample second token, calculate cross-entropy with second token of gold
Sample third token, calculate cross-entropy with third token of gold
... and so on ?
​
This way we still have differentiable metric but we have a much better alignment between train and inference scenarios -- as opposed to current teacher forcing training and sampling inference -- which I thought the OP was going for.
Kaleidophon t1_jalv0qs wrote
>Why go with BLEU though ? OP didn't particularly mention optimizing sequence level metrics.
From OPs post above:
>is this possible at all ? I think if we can backprop through beam sampling, we can directly optimise for bleu ?
Viewing a single comment thread. View all comments