Submitted by super_deap t3_11tmpc5 in MachineLearning
Dependent_Ad5120 t1_jd3knio wrote
Reply to comment by Dependent_Ad5120 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
OK, I found out why. To use flash attention, I had to use fp16. It is a bit faster then using memory_efficient attention in my test.
Viewing a single comment thread. View all comments