Viewing a single comment thread. View all comments

SoylentRox t1_j2l8yg6 wrote

No.
To run GPT here's what it actually takes.

GPT-3 is 175 billion parameters. Each parameter is a 32-bit floating point number.

So you need 700 gigabytes of memory.

For it not to run unusably slow, you need thousands of teraflops - many times what an old server CPU is capable of.

One Nvidia A-100 comes in 80 gigabyte of GPU memory models, and they are $25,000 each. You cannot use consumer GPUs because there is an interconnect you have to have connecting multiple GPUs together.

Thus you need at least 9 of them, or $218,750 just for the GPUs.

The server that hosts them, cooling, racks, etc adds extra cost. Probably at least $300,000.

The power consumption is 400 watts per A100, so 3.2 kilowatts.

11

ElvinRath t1_j2np305 wrote

Well, today you can probably get it down to around 350 GB (fp16) so around 150.000.

​

And probably soon it might work well with around 175 GB with fp8 so.... around 75.000.

But yeah, for now, it's too expensive. IF fp8 works well with this it might be possible to think about building a machine for personal with second hand products in 3-5 years...

​

Anyway this year we'll probably get open source models with better performance than GPT 3 and far less parameters. Probably still too much for consumer GPUs anyway :(

It''s time to double vram on consumer GPUs.

Twice.

Pretty please.

3

SoylentRox t1_j2nu1ph wrote

It doesn't work that way. You can't reduce precision like that without tradeoffs. Reduced model accuracy for one thing.

You can in some cases add more weights and retrain for fp16.

Int8 may be out of the question.

Also like chatGPT is like the Wright Brothers. Nobody is going to settle for an AI that can't even see or control a robot. So it's only going to get heavier in weights and more computationally expensive.

1

ElvinRath t1_j2o33lj wrote

Sure, there is a tradeoff but I think that for fp16 it isn't that terrible.

For fp8 I just don't know. There is people working with int8 to fit 20B parameters in 3090/4090, but I have no idea of at what price... Just wanted to say that the posibility does exist.

I remember reading about fitting big models in low precision but it was focused in performance/memory usage, but it showed that it was a very useful technique...

​

Anyway I can't find it now, but I found this while looking for it, haha:

https://twitter.com/thukeg/status/1579449491316674560

They claim almost no degratation with int4 & 130B parameters.

​

No idea how this could apply to bigger ones, or even about the validity of the claim, but it does sound well. We would be fitting 40B parameters in a 3090 / 4090...

​

Anyway I think that fp8 might not be out of question at all, but we will see :P

​

I know that you say "chatGPT is like the Wright Brothers. Nobody is going to settle for an AI that can't even see or control a robot. So it's only going to get heavier in weights and more computationally expensive"

And...Sure, no one is going to settle for less. But consumer hardware is very far behind and people is going to try and work with what they have, for now.

And there is some interest for it. You have NovelAI, DungeonAI and KoboldAI, and people plays with them, when frankly, they work quite poorly.

I hope that with the release of good open sourced LLM with RHLF (I'm looking at you, CarperAI and StabilityAI) & this kind of techniques we start to see this tech becoming more comonplace, maybe even used in some indie games, to start pushing for more VRAM on consumer hardware. (Because if there is a need there is a way. Vram is not that expensive anyway given the prices of GPUs nowadays...)

2

SoylentRox t1_j2oeflk wrote

>And...Sure, no one is going to settle for less. But consumer hardware is very far behind and people is going to try and work with what they have, for now.

No they won't. They are just going to rent access to the proper hardware. It's not that expensive.

1

aperrien t1_j2lktm2 wrote

Interesting enough, that's not unaffordable for a great many businesses.

1

SoylentRox t1_j2los27 wrote

Nobody will give you the weights so you can run locally a SOTA model. These academic/test models, sure. But the advanced ones that are built for profit/high end will not be given out that way.

You'll have to pay for usage. I mean if $1 gets you what would take an hour of work for someone with a college degree, it's easily worth paying it.

Not sure what the pay rates will turn out to be but using the current chatGPT it can slam out what would have taken me several hours in 30 seconds.

2

aperrien t1_j2lp9sl wrote

You could download any of the larger GPT-J models. Or even GPT-NEO. Part of the point of Huggingface is to give their models away freely. There are other entities out there that do so too. Kind of like the old Stone Soup concept.

3

SoylentRox t1_j2lrwii wrote

But the advanced ones that are built for profit/high end will not be given out that way.

1