Rabrg t1_j0ck8uj wrote on December 15, 2022 at 5:41 PM

A Mathematical Framework for Transformer Circuits

soraki_soladead OP t1_j0cmgzs wrote on December 15, 2022 at 5:55 PM

Perfect. Thank you! That explains why I couldn't find it.

EDIT: Spoke too soon. I think this covers some of the same ideas but it isn't the one I'm remembering. There's no method for simplifying the earlier layers of the transformer and exploiting the fact that they primarily learn bigrams. I could have sworn I read about it in an arxiv or openreview paper.

prohitman t1_j0dory1 wrote on December 15, 2022 at 10:02 PM

This is a really interesting article!