FaceDeer t1_jc3k2oi wrote on March 13, 2023 at 7:57 PM

Reply to comment by luaks1337 in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

I'm curious, there must be a downside to reducing the bits, mustn't there? What does intensively jpegging an AI's brain do to it? Is this why Lt. Commander Data couldn't use contractions?

luaks1337 t1_jc3p8oq wrote on March 13, 2023 at 8:30 PM

Backpropagation requires a lot of accuracy so we need 16- or 32-bit while training. However, post-training quantization seems to have very little impact on the results. There are different ways in which you can quantize but apparently llama.cpp uses the most basic way and it still works like a charm. Georgi Gerganov (maintainer) wrote a tweet about it but I can't find it right now.