Submitted by minimaxir t3_11fbccz in MachineLearning
lucidraisin t1_jamtx7b wrote
Reply to comment by fmai in [D] OpenAI introduces ChatGPT and Whisper APIs (ChatGPT API is 1/10th the cost of GPT-3 API) by minimaxir
it cannot, the compute still scales quadratically although the memory bottleneck is now gone. however, i see everyone training at 8k or even 16k within two years, which is more than plenty for previously inaccessible problems. for context lengths at the next order of magnitude (say genomics at million basepairs), we will have to see if linear attention (rwkv) pans out, or if recurrent + memory architectures make a comeback.
LetterRip t1_janljeo wrote
Ah, I'd not seen the Block Recurrent Transformers paper before, interesting.
Viewing a single comment thread. View all comments