Submitted by MahmoudAbdAlghany t3_zg71j1 in deeplearning
suflaj t1_izfw75x wrote
Reply to comment by Remi_Coulom in What framework can I use to quantize a deep learning model to specific bit-widths? by MahmoudAbdAlghany
They do, but they use bigger registers, so ultimately, unless you can hand optimize it to pack operations together, you will have no benefit from it. That would at least imply writing your own CUDA kernels.
Furthermore, 8 bit is already often too small to be stable. Why go lower? If you want garbage outputs, you could always fit whatever task on a smaller model. It's easier to cut model size in half and use 8-bit or 4x and use 16-bit, than to make 4 bit or lower work.
At this point in time, TensorRT seems to be the best you'll get for as little involvement as possible. Based on benchmarks, it also seems to outperform INT4 precision by a significant margin. The only drawback is its license, which implicitly prevents commercial use.
horselover_f4t t1_izibm6r wrote
Can I ask you what you mean by "implicitly prevents"?
https://github.com/NVIDIA/TensorRT/blob/main/LICENSE seems to permit commercial use, do you refer to trademarks?
suflaj t1_izihg01 wrote
This is only the code license for the open source portion, but the SDK license of the general, proprietary software that TensorRT is, is also something you have to agree on: https://docs.nvidia.com/deeplearning/tensorrt/sla/index.html
In there, ownership is phrased in such an ambiguous way the legal team of a company would probably never greenlight using it.
horselover_f4t t1_izik5mz wrote
I will have to check that out, thank you!
Viewing a single comment thread. View all comments