Viewing a single comment thread. View all comments


neoplastic_pleonasm t1_j6jk8gt wrote

Yep, now you only need a hundred thousand dollars more for a GPU cluster with enough VRAM to run inference with it.


NegotiationFew6680 t1_j6jmsiq wrote


Now imagine how slow that would be.

There’s a reason these models are run on distributed clusters. A single query to ChatGPT is likely being processed by multiple GPUs across dozens of machines


gmes78 t1_j6k6myq wrote

You need to fit it in GPU VRAM. So go ahead and show me a consumer GPU with 750GB of VRAM.