JClub t1_izij5x5 wrote
Hey! I'm the author of https://github.com/JoaoLages/diffusers-interpret
I have also tried to collect attentions in the diffusion process but the matrices with (text size, image size) were too big to keep in RAM/VRAM, how did you solve that problem?
tetrisdaemon OP t1_izjm0ov wrote
Cool, nicely done repository. Are you referring to the [16, 4096-ish, 77] cross-attention matrices? I maintained a streaming sum over matrices of the same size on a 64GB (though it does work with 32GB) RAM and 24GB VRAM machine.
JClub t1_izjnf35 wrote
Damn then this method can only run on such hardware, the attention weights are very heavy!
tetrisdaemon OP t1_izk7fk0 wrote
Yeah, moving forward it might help to have a disk caching mode.
Viewing a single comment thread. View all comments