Viewing a single comment thread. View all comments

jakderrida t1_j300s6g wrote

Quantization-aware training: PyTorch provides a set of APIs for performing quantization-aware training, which allows you to train a model with quantization in mind and can often result in higher-quality quantized models. You can find more information about quantization-aware training in the PyTorch documentation (https://pytorch.org/docs/stable/quantization.html#quantization-aware-training).

Post-training static quantization: PyTorch also provides APIs for performing post-training static quantization, which involves quantizing a model that has already been trained. You can find more information about post-training static quantization in the PyTorch documentation (https://pytorch.org/docs/stable/quantization.html#post-training-static-quantization).

Dynamic quantization: PyTorch also supports dynamic quantization, which allows you to quantize a model at runtime. This can be useful for applications where the model needs to be deployed on devices with limited memory or computational resources. You can find more information about dynamic quantization in the PyTorch documentation (https://pytorch.org/docs/stable/quantization.html#dynamic-quantization).

2