Viewing a single comment thread. View all comments

Unlucky_Excitement_2 t1_jdavhcr wrote

Bro what are you talking about LOL. Its context length he's discussing. There are multiple ways[all of which I'm expertimenting with] ->

  1. flash attention
  2. strided context window
  3. finetuning on a dataset with longer sequences
0

KerfuffleV2 t1_jdbrkc1 wrote

Uh, did you reply to the wrong person or something? Your post doesn't have anything to do with either mine or the parent.

3