manOnPavementWaving t1_it393xz wrote on October 20, 2022 at 4:45 PM

my man you cant just scale cost with number of tokens and not number of parameters

way too many mostly false assumptions in these calculations

Angry_Grandpa_ OP t1_it4qmjp wrote on October 20, 2022 at 10:38 PM

It's based on the Chinchilla paper and not my personal opinion. You should read the paper.

If you think the assumptions are wrong you should do your own projections based on the paper.

manOnPavementWaving t1_it4r8la wrote on October 20, 2022 at 10:42 PM

I have read the paper, which is how I know that they scale data and parameters equally, meaning a 10x in data results in a 100x in compute required and hence a 100x in cost.

Assumptions wise Im looking more at the number of words on youtube, your estimate is likely wildly off.

Youre also ignoring that the training time could very well be long enough that it would be a better strategy to wait for better GPUs to come out.

Angry_Grandpa_ OP t1_it50h5j wrote on October 20, 2022 at 11:53 PM

What is your estimate?

LeroyJanky80 t1_it6g9ub wrote on October 21, 2022 at 7:45 AM

The data Google/Alphabet has is obviously it's most powerful asset. My guess is they've done this, have the means, brain trust, wealth and capacity to do this. They can easily cover this in all domains where people, infrastructure and content are concerned. It's a massive endeavour but so is what they did with the entire internet many many years ago and at the time it was groundbreaking.