new_name_who_dis_ t1_ivtav2j wrote on November 10, 2022 at 1:33 PM

Reply to comment by natural_language_guy in [D] Is there anything like beam search with BERT? by natural_language_guy

No I’m not saying to discard Bert you still use the Bert as encoder and use mdn like network as a final layer. It could still be a self attention layer just trained with the mdn loss function. MDN isn't a different architecture, it's just a different loss function for your final output that isn't deterministic but allows for multiple outputs