Viewing a single comment thread. View all comments

Dependent_Ad5120 t1_jd3knio wrote on March 21, 2023 at 4:17 PM

OK, I found out why. To use flash attention, I had to use fp16. It is a bit faster then using memory_efficient attention in my test.