kkg_scorpio t1_jbz91de wrote on March 12, 2023 at 9:39 PM Reply to comment by Upstairs_Suit_9464 in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692 Check out the terms "quantization aware training" and "post training quantization". 8-bit, 4-bit, 2-bit, hell even 1-bit inference are scenarios which are extremely relevant for edge devices. Permalink Parent 27
kkg_scorpio t1_j4jz12m wrote on January 16, 2023 at 6:13 AM Reply to comment by [deleted] in Post exposure rabies shots protection? by [deleted] Isn't natural infection of rabies 100% lethal? Permalink Parent 5
kkg_scorpio t1_jbz91de wrote
Reply to comment by Upstairs_Suit_9464 in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
Check out the terms "quantization aware training" and "post training quantization".
8-bit, 4-bit, 2-bit, hell even 1-bit inference are scenarios which are extremely relevant for edge devices.