Viewing a single comment thread. View all comments

Thunderbird120 t1_jakbyew wrote

You're better qualified to know than nearly anyone who posts here, but is flash attention really all that's necessary to make that feasible?

24

lucidraisin t1_jakdtf7 wrote

yes

edit: it was also used to train Llama. there is no reason not to use it at this point, for both training and fine-tuning / inference

46