JClub t1_izij5x5 wrote on December 9, 2022 at 10:42 AM

Hey! I'm the author of https://github.com/JoaoLages/diffusers-interpret

I have also tried to collect attentions in the diffusion process but the matrices with (text size, image size) were too big to keep in RAM/VRAM, how did you solve that problem?

tetrisdaemon OP t1_izjm0ov wrote on December 9, 2022 at 4:19 PM

Cool, nicely done repository. Are you referring to the [16, 4096-ish, 77] cross-attention matrices? I maintained a streaming sum over matrices of the same size on a 64GB (though it does work with 32GB) RAM and 24GB VRAM machine.

JClub t1_izjnf35 wrote on December 9, 2022 at 4:28 PM

Damn then this method can only run on such hardware, the attention weights are very heavy!

tetrisdaemon OP t1_izk7fk0 wrote on December 9, 2022 at 6:33 PM

Yeah, moving forward it might help to have a disk caching mode.