Submitted by juliensalinas t3_11tqryd in MachineLearning
pitrucha t1_jckiv1q wrote
Any plans to quantize it? I saw that someone managed to do so with 65B LLama and push it from 120 to 30 GB
juliensalinas OP t1_jcktk8o wrote
This is maybe something I'll focus on in the future. But for the moment I find this fp16 version well suited for small budgets as it runs on a 16GB GPU while the native fp32 version of GPT-J requires at least 24GB of VRAM.
Also, with the bitsandbytes integration in HF Transformers you can use the model in 8 bits: https://huggingface.co/blog/hf-bitsandbytes-integration
Viewing a single comment thread. View all comments