Viewing a single comment thread. View all comments

Awekonti t1_j0y7esq wrote

>Is quantization ultimately a kind of scaling

Not really, it is about approximating (or better to say mapping) of real-world values that brings the limits. So that the model shrinks - computations and other model operations are being executed at lower bit-width(s).

2

trnka t1_j1hcd0f wrote

Adding a practical example:

I worked on SDKs for mobile phone keyboards on Android devices. The phone manufacturers at the time didn't let us download language data so it needed to ship on the phones out of the box. One of the big parts of each language's data was the ngram model. Quantization allowed us to save the language model probabilities with less precision and we were able to shrink them down with minimal impact on the quality of the language model. That extra space allowed us to ship more languages and/or ship models with higher quality in the same space.

1