Viewing a single comment thread. View all comments

zero_for_effort t1_itbx41r wrote

I'm certainly in no position to provide any insight as a complete outsider to tech and AI, but I do wonder if all the recent breakthroughs might've meant that they've restarted from scratch once or twice. What do you do if you realise your model will be obsolete before it's even fully trained?

56

visarga t1_itc1tbb wrote

U-PaLM and Flan PaLM come to mind.

The first one shows we were noising the data incorrectly, different kind of noising has major benefits. Second shows that training on thousands of tasks boosts the model capability to follow instructions and also get better score. So maybe OpenAI had to change their plans midway.

It's also possible that they don't want to scale up even further because it's very impractical. Too expensive to use, not just to train. And recent models like Flan get GPT-3 scores on many (not all) tasks with just 3B parameters.

There's also a question about training data - where can they get 10x or 100x more? I bet they transcribe videos, probably all videos that can be accessed. Another approach is to use raw audio instead of text, works well. I bet they have a large team just for dataset building. BLOOM was managed by an organisation with 1000 people and a lot of their effort was into the dataset sourcing, trying to reduce biases.

33

DukkyDrake t1_itcdotk wrote

>recent breakthroughs might've meant that they've restarted from scratch once or twice

That is almost certainly not happening given the cost of training. Probably more a case of "Measure thrice, check twice, cut once"

5