Submitted by MrEloi t3_100q6a8 in singularity

Just for fun, I have had a quick check to see what a PC with enough RAM to run GPT or similar AI standalone would cost.

It seems that used server hardware can be found for around £10,000 or below.

Not cheap .. but not impossible for some individuals and many firms.

(If 10 x 128GB RAM PCs could be configured to work together, then a total of only £1000 for all 10 would get you to the 1TB RAM level!)

AI systems are likely to spread like wildfire over the next year or two, especially if true open source standalone systems appear.

UPDATE:
Many thanks for the great comments.

  • It seems that GPT-3 needs under 1TB VRAM (that's RAM on GPU cards)
  • GPT-3 itself claims that it needs up to 32 GPUs to run.
  • GPT-3 can run across multiple GPUs, both for learning and inference.

In view of these points, it looks like 'private' versions are do-able, possibly for around $20 - $30k ... maybe a lot less.

26

Comments

You must log in or register to comment.

XoxoForKing t1_j2j7ptv wrote

Ex home server owner here. The problem of used servers is that they might require a lot of maintenance, you hardly ever have warranty left and you could easily end up with some components reaching the end of their lifespan (even tho one of the best points of server is their redundancy and failproof-ness, but still expensive)

24

MrEloi OP t1_j2j9s19 wrote

Good point.

Any idea what a NEW 1TB PC with say 50TB hard disk would cost?

3

XoxoForKing t1_j2jbm8x wrote

Totally clueless honestly.

I tried randomly checking with this site, adding only the most basic compatible CPU and getting to the 1TB ram + 50TB 10k HDD it goes up to ~$52k.

But again, I never worked with new components nor servers with this sort of high-end specs

3

MrEloi OP t1_j2jf5wa wrote

Good find.

At least we now know that for $50k max a working AI platform can be built.

That could be a worthwhile investment for someone starting their career .. not much different from a typical student loan .. and probably much more useful!

2

ElvinRath t1_j2kkjuc wrote

I think that it would be a very bad idea in most cases.

​

But I'm confused on why you are talking about RAM as something very important for AI.

Vram is what you really need, I would say. (I mean, you need some ram, of course, but probably 16 GB is enought... 32 at the most)

And you need to have that VRAM in GPUs powerfull enought, of course, it's not just having enought vram, is also computing power.

10

CeFurkan t1_j2jri17 wrote

make no mistake. those ram would not be beneficial for you. almost all ai algorithms running on gpu. otherwise just too slow.

for example rtx 3060 is 22 times faster than my core i7 10700 f

How Good is RTX 3060 for ML AI Deep Learning Tasks and Comparison With GTX 1050 Ti and i7 10700F CPU

16

iNstein t1_j2l04ge wrote

Maybe op should look at multiple 4080s running together, something like used when mining crypto? I know you can get special motherboards that are designed to allow you to have multiple cards connected, just need a honking great power supply.

3

MrEloi OP t1_j2wlzzz wrote

Good idea.

I have checked the pricing of 12gb & 24GB GPUs ... not too bad currently.

I suspect that say 5 x 24GB GPUs would come to under $10k.

Add in the special motherboard and maybe we are looking at around $15k for a complete hardware system.

With careful purchasing that could possibly come down to under $10k total.

And, yes, GPT can be run across multiple GPUS, both when training and for running.

1

SoylentRox t1_j2l8yg6 wrote

No.
To run GPT here's what it actually takes.

GPT-3 is 175 billion parameters. Each parameter is a 32-bit floating point number.

So you need 700 gigabytes of memory.

For it not to run unusably slow, you need thousands of teraflops - many times what an old server CPU is capable of.

One Nvidia A-100 comes in 80 gigabyte of GPU memory models, and they are $25,000 each. You cannot use consumer GPUs because there is an interconnect you have to have connecting multiple GPUs together.

Thus you need at least 9 of them, or $218,750 just for the GPUs.

The server that hosts them, cooling, racks, etc adds extra cost. Probably at least $300,000.

The power consumption is 400 watts per A100, so 3.2 kilowatts.

11

ElvinRath t1_j2np305 wrote

Well, today you can probably get it down to around 350 GB (fp16) so around 150.000.

​

And probably soon it might work well with around 175 GB with fp8 so.... around 75.000.

But yeah, for now, it's too expensive. IF fp8 works well with this it might be possible to think about building a machine for personal with second hand products in 3-5 years...

​

Anyway this year we'll probably get open source models with better performance than GPT 3 and far less parameters. Probably still too much for consumer GPUs anyway :(

It''s time to double vram on consumer GPUs.

Twice.

Pretty please.

3

SoylentRox t1_j2nu1ph wrote

It doesn't work that way. You can't reduce precision like that without tradeoffs. Reduced model accuracy for one thing.

You can in some cases add more weights and retrain for fp16.

Int8 may be out of the question.

Also like chatGPT is like the Wright Brothers. Nobody is going to settle for an AI that can't even see or control a robot. So it's only going to get heavier in weights and more computationally expensive.

1

ElvinRath t1_j2o33lj wrote

Sure, there is a tradeoff but I think that for fp16 it isn't that terrible.

For fp8 I just don't know. There is people working with int8 to fit 20B parameters in 3090/4090, but I have no idea of at what price... Just wanted to say that the posibility does exist.

I remember reading about fitting big models in low precision but it was focused in performance/memory usage, but it showed that it was a very useful technique...

​

Anyway I can't find it now, but I found this while looking for it, haha:

https://twitter.com/thukeg/status/1579449491316674560

They claim almost no degratation with int4 & 130B parameters.

​

No idea how this could apply to bigger ones, or even about the validity of the claim, but it does sound well. We would be fitting 40B parameters in a 3090 / 4090...

​

Anyway I think that fp8 might not be out of question at all, but we will see :P

​

I know that you say "chatGPT is like the Wright Brothers. Nobody is going to settle for an AI that can't even see or control a robot. So it's only going to get heavier in weights and more computationally expensive"

And...Sure, no one is going to settle for less. But consumer hardware is very far behind and people is going to try and work with what they have, for now.

And there is some interest for it. You have NovelAI, DungeonAI and KoboldAI, and people plays with them, when frankly, they work quite poorly.

I hope that with the release of good open sourced LLM with RHLF (I'm looking at you, CarperAI and StabilityAI) & this kind of techniques we start to see this tech becoming more comonplace, maybe even used in some indie games, to start pushing for more VRAM on consumer hardware. (Because if there is a need there is a way. Vram is not that expensive anyway given the prices of GPUs nowadays...)

2

SoylentRox t1_j2oeflk wrote

>And...Sure, no one is going to settle for less. But consumer hardware is very far behind and people is going to try and work with what they have, for now.

No they won't. They are just going to rent access to the proper hardware. It's not that expensive.

1

aperrien t1_j2lktm2 wrote

Interesting enough, that's not unaffordable for a great many businesses.

1

SoylentRox t1_j2los27 wrote

Nobody will give you the weights so you can run locally a SOTA model. These academic/test models, sure. But the advanced ones that are built for profit/high end will not be given out that way.

You'll have to pay for usage. I mean if $1 gets you what would take an hour of work for someone with a college degree, it's easily worth paying it.

Not sure what the pay rates will turn out to be but using the current chatGPT it can slam out what would have taken me several hours in 30 seconds.

2

aperrien t1_j2lp9sl wrote

You could download any of the larger GPT-J models. Or even GPT-NEO. Part of the point of Huggingface is to give their models away freely. There are other entities out there that do so too. Kind of like the old Stone Soup concept.

3

SoylentRox t1_j2lrwii wrote

But the advanced ones that are built for profit/high end will not be given out that way.

1

syfari t1_j2kt3qb wrote

Standard ram would be mostly useless as ML uses GPU memory. Older teslas will consume a ton of power and won't be able to compete with cloud services in cost efficiency.

2