[N] PyTorch 2.0: Our next generation release that is faster, more Pythonic and Dynamic as ever Submitted by [deleted] t3_11s58n4 on March 15, 2023 at 6:42 PM in MachineLearning 33 comments 210
Competitive-Rub-1958 t1_jccyreq wrote on March 15, 2023 at 10:59 PM I think I may be reading things wrong here, but FlashAttention is only for calculating basic scaled QKV attention, not embedded inside their MHA module? Permalink 2
Viewing a single comment thread. View all comments