natural_language_guy OP t1_ivsdjo8 wrote on November 10, 2022 at 6:44 AM

Reply to comment by new_name_who_dis_ in [D] Is there anything like beam search with BERT? by natural_language_guy

If the advice is to discard BERT and go with MDN, do you think MDNs in this case would perform better than some large generative model like t5 with beam search?

The MDN does look interesting, and it looks like there are some libraries available for it already, but I don't have much experience using deep prob. models.

new_name_who_dis_ t1_ivtav2j wrote on November 10, 2022 at 1:33 PM

No I’m not saying to discard Bert you still use the Bert as encoder and use mdn like network as a final layer. It could still be a self attention layer just trained with the mdn loss function. MDN isn't a different architecture, it's just a different loss function for your final output that isn't deterministic but allows for multiple outputs