Viewing a single comment thread. View all comments

tetrisdaemon OP t1_izjm0ov wrote

Cool, nicely done repository. Are you referring to the [16, 4096-ish, 77] cross-attention matrices? I maintained a streaming sum over matrices of the same size on a 64GB (though it does work with 32GB) RAM and 24GB VRAM machine.

3

JClub t1_izjnf35 wrote

Damn then this method can only run on such hardware, the attention weights are very heavy!

1

tetrisdaemon OP t1_izk7fk0 wrote

Yeah, moving forward it might help to have a disk caching mode.

2