mike94025 t1_jcmlddm wrote on March 17, 2023 at 10:42 PM

Reply to comment by cthorrez in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

Better Transformer supports both, today. Some optimizations are still inference-only (and in particular support for variable-sequence length Nested Tensor) and the inference fastpath is a bit silo'ed, but nothing that future PyTorch update could not fix.

[deleted] t1_jcmtgb4 wrote on March 17, 2023 at 11:41 PM

[removed]