Hsemar t1_jalp8as wrote on March 2, 2023 at 9:12 AM

Reply to comment by lucidraisin in [D] OpenAI introduces ChatGPT and Whisper APIs (ChatGPT API is 1/10th the cost of GPT-3 API) by minimaxir

but does flash attention help with auto-regressive generation? My understanding was that it prevents materializing the large kv dot product during training. At inference (one token at a time) with kv caching this shouldn't be that relevant right?

[deleted] t1_jarikhz wrote on March 3, 2023 at 3:21 PM

[deleted]