Submitted by super_deap t3_11tmpc5 in MachineLearning
HateRedditCantQuitit t1_jcmdot7 wrote
Reply to comment by Spiritual-Reply5896 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
I think of context as a end-to-end connected version of retrieval. You can backprop from loss to retrieved info, but you also want to backprop from loss to the non-retrieved info, which would basically be equivalent to having it all in context (in a handwavy way). Which is to say that just having more context is a simple solution.
I think everyone knows increasing context length is not 100% sufficient, but it sure is a simple convenient solution.
Viewing a single comment thread. View all comments