Longjumping_Essay498

Longjumping_Essay498 OP t1_j2f9r9s wrote

Let say if for some example we dig into these attention maps, and find some perspective of some head for attending words. For an example in gpt some head focus on parts of speech. Will it always reliably do it for all example? What do you think. Can we manually evaluate and categorize the learnings??

3