Viewing a single comment thread. View all comments

Competitive-Rub-1958 t1_jccyreq wrote on March 15, 2023 at 10:59 PM

I think I may be reading things wrong here, but FlashAttention is only for calculating basic scaled QKV attention, not embedded inside their MHA module?