Viewing a single comment thread. View all comments

Emergency_Apricot_77 t1_jah9rb7 wrote

Why go with BLEU though ? OP didn't particularly mention optimizing sequence level metrics. Can't we still use cross entropy ? Something as follows:

Sample first token, calculate cross-entropy with first token of gold

Sample second token, calculate cross-entropy with second token of gold

Sample third token, calculate cross-entropy with third token of gold

... and so on ?

​

This way we still have differentiable metric but we have a much better alignment between train and inference scenarios -- as opposed to current teacher forcing training and sampling inference -- which I thought the OP was going for.

1

Kaleidophon t1_jalv0qs wrote

>Why go with BLEU though ? OP didn't particularly mention optimizing sequence level metrics.

From OPs post above:

>is this possible at all ? I think if we can backprop through beam sampling, we can directly optimise for bleu ?

1