Viewing a single comment thread. View all comments

LetterRip t1_iusmrac wrote on November 2, 2022 at 6:50 PM

bitsandbytes LLM int8 you can quantize most weights in large models, and keep a small subset in full range, and get equivalent output. You could then also use a lookup table to further compress the weights.