Viewing a single comment thread. View all comments

LetterRip t1_iusmrac wrote

bitsandbytes LLM int8 you can quantize most weights in large models, and keep a small subset in full range, and get equivalent output. You could then also use a lookup table to further compress the weights.

2