Can we use attention weights from causal models, as explanations or causal attributes for next word predictions?

Comments

fawkesdotbe t1_j2f9bcz wrote on December 31, 2022 at 8:58 PM

Have fun: https://aclanthology.org/2022.acl-long.269/

> Adrien Bibal, Rémi Cardon, David Alfter, Rodrigo Wilkens, Xiaoou Wang, Thomas François, and Patrick Watrin. 2022. Is Attention Explanation? An Introduction to the Debate. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3889–3900, Dublin, Ireland. Association for Computational Linguistics.

currentscurrents t1_j2f996k wrote on December 31, 2022 at 8:58 PM

Attention maps can be a type of explanation.

It tells you what the model was looking at when it generated a word or identified an image, but it doesn't tell you why it looked at those bits or why it made the decision it did. You can get some useful information by looking at it, but not everything you need to explain the model.

Longjumping_Essay498 OP t1_j2f9r9s wrote on December 31, 2022 at 9:01 PM

Let say if for some example we dig into these attention maps, and find some perspective of some head for attending words. For an example in gpt some head focus on parts of speech. Will it always reliably do it for all example? What do you think. Can we manually evaluate and categorize the learnings??

IntelArtiGen t1_j2fcirf wrote on December 31, 2022 at 9:22 PM

You can just say "the network evaluated that it needed to give more attention to these parts to perform the task". You can speculate why but you can't be sure.

currentscurrents t1_j2fduvv wrote on December 31, 2022 at 9:31 PM

You can get some information this way, but not everything you would want to know. You can try it yourself with BertViz.

The information you do get can be useful though. For example in image processing, you can use the attention map from an object classifier to see where the object is in the image.