Viewing a single comment thread. View all comments

m98789 t1_j3wtx3g wrote

I think you may be underestimating the compute cost. It’s about $6M of compute (A100 servers) to train a GPT-3 level model from scratch. So with a billion dollars, that’s about 166 models. Considering experimentation, scaling upgrades, etc., that money will go quickly. Additionally, the cost to host the model to perform inference at scale is also very expensive. So it may be the case that the $10B investment isn’t all cash, but maybe partially paid in Azure compute credits. Considering they are already running on Azure.

2

All-DayErrDay t1_j3xttzb wrote

500k, actually (per MosaicML). Will likely drop to 100k soon with H100s being several times faster. Would probably be even lower if you added every efficiency gain currently available.

2

m98789 t1_j3xxyvm wrote

You are right that the trend is for costs to go down. It was originally reported that it took $12M in compute costs for a single training run of GPT-3 (source).

H100s will make a significant difference and all the optimization techniques. So I agree prices will drop a lot, but for the foreseeable future, still be out of reach for mere mortals.

2

starstruckmon t1_j3wvdqt wrote

>I think you may be underestimating the compute cost. It’s about $6M of compute (A100 servers) to train a GPT-3 level model from scratch. So with a billion dollars, that’s about 166 models.

I was actually overestimating the cost to train. I honestly don't see how these numbers don't further demonstrate my point. Even if it cost a whole billion ( that's a lot of experimental models ), that's still 10 times less than what they're paying.

>Considering experimentation, scaling upgrades, etc., that money will go quickly. Additionally, the cost to host the model to perform inference at scale is also very expensive. So it may be the case that the $10B investment isn’t all cash, but maybe partially paid in Azure compute credits. Considering they are already running on Azure.

I actually expect every last penny to go into the company. They definitely aren't buying anyone's shares ( other than maybe a partial amount of employee's vested shares ; this is not the bulk ). It's mostly for new shares created. But $10B for ~50% still gives you a pre-money valuation of ~10B. That's a lot.

1

Non-jabroni_redditor t1_j3yr9cx wrote

Time. The answer is time and risk for why they are spending 10x.

They can spend the next however many years attempting to build a model that is like gpt but is entirely possible it’s just not as good after all of that. The other option is pay a premium with money they have for a known product.

1