This is probably only the case in which there’s a very low “compression ratio” of model parameters to learned entropy.
Basically, if the model has “too many” parameters it can be distilled but we’ve found that, empirically, until that point is hit, transformers scale extremely well and are generally better than any other known architecture.
Another topic is sparsificafion, which takes a trained model and tries to cut out some percentage of weights that have a minimal output effect, then fine tuning that model. You can check out Neural Magic online and associated works… they can run models on CPUs that normally require GPUs
Sm0oth_kriminal t1_j7y6wv6 wrote
Reply to comment by avocadoughnut in [D] Using LLMs as decision engines by These-Assignment-936
This is probably only the case in which there’s a very low “compression ratio” of model parameters to learned entropy.
Basically, if the model has “too many” parameters it can be distilled but we’ve found that, empirically, until that point is hit, transformers scale extremely well and are generally better than any other known architecture.
Another topic is sparsificafion, which takes a trained model and tries to cut out some percentage of weights that have a minimal output effect, then fine tuning that model. You can check out Neural Magic online and associated works… they can run models on CPUs that normally require GPUs