Viewing a single comment thread. View all comments

Competitive-Rub-1958 t1_jccyreq wrote

I think I may be reading things wrong here, but FlashAttention is only for calculating basic scaled QKV attention, not embedded inside their MHA module?

2