Submitted by minimaxir t3_11fbccz in MachineLearning
Hsemar t1_jalp8as wrote
Reply to comment by lucidraisin in [D] OpenAI introduces ChatGPT and Whisper APIs (ChatGPT API is 1/10th the cost of GPT-3 API) by minimaxir
but does flash attention help with auto-regressive generation? My understanding was that it prevents materializing the large kv dot product during training. At inference (one token at a time) with kv caching this shouldn't be that relevant right?
[deleted] t1_jarikhz wrote
[deleted]
Viewing a single comment thread. View all comments