Submitted by super_deap t3_11tmpc5 in MachineLearning
mike94025 t1_jcmlddm wrote
Reply to comment by cthorrez in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
Better Transformer supports both, today. Some optimizations are still inference-only (and in particular support for variable-sequence length Nested Tensor) and the inference fastpath is a bit silo'ed, but nothing that future PyTorch update could not fix.
[deleted] t1_jcmtgb4 wrote
[removed]
Viewing a single comment thread. View all comments