Viewing a single comment thread. View all comments

suflaj t1_jdf3j2k wrote

Unless you plan on quantizing your model or loading it layer by layer, I'm afraid 2B parameters is the most you'll get. 10GB VRAM is not really enough for CV nowadays, let alone NLP. With quantization, you can barely run the 7B model.

4 bit doesn't matter at the end of the day since it's not supported out of the box, unless you intend to implement it yourself.

1