Avelina9X

Avelina9X t1_jbt4o8y wrote

So the attention mechanism has N^2 space and time complexity relative to sequence length. However, if you are memory constrained it is possible to get the memory requirement per token down to O(N) by computing only 1 token at a time and caching the previous keys and values. This is only really possible at inference time and requires the architecture was implemented with caching in mind.

1

Avelina9X OP t1_j55kcjo wrote

Yeah that's really weird. We're documenting google chrome silently adding upscaling. I think it's a really worthwhile discussion for the community to figure out what model its using as well as how they're implementing it in a cross platform, GPU agnostic way that is buttery smooth and doesn't use a tone of resources.

14

Avelina9X OP t1_j55h54h wrote

I think it's clientside. Which is why I mentioned perhaps its using a GLSL based CNN which is absolutely possible in WebGL2 and I've been experimenting with that sort of tech myself (not for upscaling, but just as a proof of concept CNN in WebGL).

15