younesbelkada OP t1_j1adt9t wrote on December 22, 2022 at 9:04 PM

Reply to comment by matigekunst in [D] BLIP is now available on transformers, what are the cool apps you can build on top of it? by younesbelkada

super cool!!

younesbelkada OP t1_j14eysb wrote on December 21, 2022 at 4:21 PM

Reply to comment by SergioSV96 in [D] BLIP is now available on transformers, what are the cool apps you can build on top of it? by younesbelkada

no actually the BLIP demo was on the Hub but the model architecture and weights were not on the library yet

younesbelkada t1_ixdyvls wrote on November 22, 2022 at 6:49 PM

Reply to comment by JahrudZ in [P] BetterTransformer: PyTorch-native free-lunch speedups for Transformer-based models by fxmarty

because BetterTransformer merges the whole TransformerEncoderLayer operations in a single operation. This is called with the appropriate weights / biases at runtime.

For int8, each linear layer is replaced by the linear layer from bitsandbytes, that are slightly particular. At runtime it decomposes the matrix multiplication in two stages, and this is done with particular CUDA kernels. Therefore since this is not embedded in the fused operation from PyTorch, these two options are mutually exclusive. Please read more about int8 models here: https://huggingface.co/blog/hf-bitsandbytes-integration

younesbelkada t1_ixdwsh6 wrote on November 22, 2022 at 6:35 PM

Reply to comment by JahrudZ in [P] BetterTransformer: PyTorch-native free-lunch speedups for Transformer-based models by fxmarty

I know at least that this is mutually exclusive with int8, did not tried with DS though.