Viewing a single comment thread. View all comments

_learn_faster_ OP t1_ja6zovh wrote

We have GPUs (e.g. A100) but can only use 1 GPU per request (not multi-gpu). We are also willing to take a bit of an accuracy hit.

Let me know what you think would be best for us?

When you say compression do you mean things like pruning and distillation?