pitrucha t1_jckiv1q wrote on March 17, 2023 at 2:32 PM

Any plans to quantize it? I saw that someone managed to do so with 65B LLama and push it from 120 to 30 GB

juliensalinas OP t1_jcktk8o wrote on March 17, 2023 at 3:42 PM

This is maybe something I'll focus on in the future. But for the moment I find this fp16 version well suited for small budgets as it runs on a 16GB GPU while the native fp32 version of GPT-J requires at least 24GB of VRAM.

Also, with the bitsandbytes integration in HF Transformers you can use the model in 8 bits: https://huggingface.co/blog/hf-bitsandbytes-integration