Emergency_Apricot_77 t1_jah9rb7 wrote on March 1, 2023 at 12:51 PM

Reply to comment by Kaleidophon in [D] backprop through beam sampling ? by SaltyStackSmasher

Why go with BLEU though ? OP didn't particularly mention optimizing sequence level metrics. Can't we still use cross entropy ? Something as follows:

Sample first token, calculate cross-entropy with first token of gold

Sample second token, calculate cross-entropy with second token of gold

Sample third token, calculate cross-entropy with third token of gold

... and so on ?

This way we still have differentiable metric but we have a much better alignment between train and inference scenarios -- as opposed to current teacher forcing training and sampling inference -- which I thought the OP was going for.

Kaleidophon t1_jalv0qs wrote on March 2, 2023 at 10:35 AM

>Why go with BLEU though ? OP didn't particularly mention optimizing sequence level metrics.

From OPs post above:

>is this possible at all ? I think if we can backprop through beam sampling, we can directly optimise for bleu ?