HateRedditCantQuitit t1_jcmdot7 wrote on March 17, 2023 at 9:48 PM

Reply to comment by Spiritual-Reply5896 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

I think of context as a end-to-end connected version of retrieval. You can backprop from loss to retrieved info, but you also want to backprop from loss to the non-retrieved info, which would basically be equivalent to having it all in context (in a handwavy way). Which is to say that just having more context is a simple solution.

I think everyone knows increasing context length is not 100% sufficient, but it sure is a simple convenient solution.