Viewing a single comment thread. View all comments

JClub t1_izij5x5 wrote

Hey! I'm the author of https://github.com/JoaoLages/diffusers-interpret

I have also tried to collect attentions in the diffusion process but the matrices with (text size, image size) were too big to keep in RAM/VRAM, how did you solve that problem?

2

tetrisdaemon OP t1_izjm0ov wrote

Cool, nicely done repository. Are you referring to the [16, 4096-ish, 77] cross-attention matrices? I maintained a streaming sum over matrices of the same size on a 64GB (though it does work with 32GB) RAM and 24GB VRAM machine.

3

JClub t1_izjnf35 wrote

Damn then this method can only run on such hardware, the attention weights are very heavy!

1

tetrisdaemon OP t1_izk7fk0 wrote

Yeah, moving forward it might help to have a disk caching mode.

2