Submitted by natural_language_guy t3_ypxyud in MachineLearning
new_name_who_dis_ t1_ivtav2j wrote
Reply to comment by natural_language_guy in [D] Is there anything like beam search with BERT? by natural_language_guy
No I’m not saying to discard Bert you still use the Bert as encoder and use mdn like network as a final layer. It could still be a self attention layer just trained with the mdn loss function. MDN isn't a different architecture, it's just a different loss function for your final output that isn't deterministic but allows for multiple outputs
Viewing a single comment thread. View all comments