new_name_who_dis_ t1_ivq7gwm wrote on November 9, 2022 at 8:27 PM

To make Beam search work with BERT you'd need to change the way BERT works, which you could do but it's probably too complicated for what you want to do.

What you could do instead is just using a non-determenistic classifier, like a Mixture Density Network. It predicts several outputs as well as their likelihood.

natural_language_guy OP t1_ivsdjo8 wrote on November 10, 2022 at 6:44 AM

If the advice is to discard BERT and go with MDN, do you think MDNs in this case would perform better than some large generative model like t5 with beam search?

The MDN does look interesting, and it looks like there are some libraries available for it already, but I don't have much experience using deep prob. models.

new_name_who_dis_ t1_ivtav2j wrote on November 10, 2022 at 1:33 PM

No I’m not saying to discard Bert you still use the Bert as encoder and use mdn like network as a final layer. It could still be a self attention layer just trained with the mdn loss function. MDN isn't a different architecture, it's just a different loss function for your final output that isn't deterministic but allows for multiple outputs