Viewing a single comment thread. View all comments

natural_language_guy OP t1_ivsdjo8 wrote

If the advice is to discard BERT and go with MDN, do you think MDNs in this case would perform better than some large generative model like t5 with beam search?

The MDN does look interesting, and it looks like there are some libraries available for it already, but I don't have much experience using deep prob. models.

1

new_name_who_dis_ t1_ivtav2j wrote

No I’m not saying to discard Bert you still use the Bert as encoder and use mdn like network as a final layer. It could still be a self attention layer just trained with the mdn loss function. MDN isn't a different architecture, it's just a different loss function for your final output that isn't deterministic but allows for multiple outputs

1