Viewing a single comment thread. View all comments

nibbels t1_j2817hu wrote

Imo, read work from Been Kim and her affiliates. They study problems in post-hoc xai methods. A recent-ish paper shows these methods don't always reveal spurious correlations. You would also probably do well to study flaws in the models themselves (underspecification, spurious correlations, etc).

You can also look into "inherently" interpretable models. These are models that, for whatever reason, lend themselves to their own explanations. Attention models are an example. And Hinton's new "forward forward" method seems more intrinsically interpretable. Disclaimer: attention weights have their own issues, and are not always accurate.

If you're thinking of developing your own xai methods, I should warn you: the field is flooded with "new" methods that are basically just tweaks on current methods. Instead, if you do want to work on new methods, I recommend getting a very good sense of where these methods fail (search Kim's work, and if you want more specifics, I can provide links), then testing new methods in a very robust and rigorous way.

For a better understanding of the field, check out this paper: https://www.researchgate.net/publication/358123243_Explainable_Deep_Learning_A_Field_Guide_for_the_Uninitiated

Christoph Molnar also has some really good explanations of the methodologies.

4