Viewing a single comment thread. View all comments

−1

neoplastic_pleonasm t1_j6jk8gt wrote

Yep, now you only need a hundred thousand dollars more for a GPU cluster with enough VRAM to run inference with it.

7

NegotiationFew6680 t1_j6jmsiq wrote

Hahahaha

Now imagine how slow that would be.

There’s a reason these models are run on distributed clusters. A single query to ChatGPT is likely being processed by multiple GPUs across dozens of machines

6

gmes78 t1_j6k6myq wrote

You need to fit it in GPU VRAM. So go ahead and show me a consumer GPU with 750GB of VRAM.

2