f_max
f_max t1_j3hztd5 wrote
Reply to comment by allaboutthatparklife in [D] Will NLP Researchers Lose Our Jobs after ChatGPT? by singularpanda
Idk. Have a decent idea what’s being worked on for the next year but it gets fuzzy after that. Maybe we’ll have another architectural breakthrough. Alex net 2012, transformers 2017, something else 2023 or 2024 maybe.
f_max t1_j3frqfb wrote
Reply to comment by singularpanda in [D] Will NLP Researchers Lose Our Jobs after ChatGPT? by singularpanda
They have a sequence of models ranging from 6B params up to 175B largest, so you can work on smaller variants if you don’t have gpus. There’s def some papers working on inference efficiency and benchmarking their failure modes if you look around.
f_max t1_j3frhxs wrote
Reply to comment by singularpanda in [D] Will NLP Researchers Lose Our Jobs after ChatGPT? by singularpanda
Megawatt sounds right for training. But kilowatts for inference. Take a look at tim dettmer’s work (he’s at UW) on int8 to see some of this kind of efficiency work. There’s definitely significant work happening in the open.
f_max t1_j3eagrm wrote
Reply to comment by singularpanda in [D] Will NLP Researchers Lose Our Jobs after ChatGPT? by singularpanda
Right. So if you’d rather not shoot to join a big company, there’s still work that can be done in academia with say a single A100. Might be a bit constrained at pushing the bleeding edge of capability. But there’s much to do to characterize LLMs. They’re black boxes we don’t understand in a bigger way than maybe any previous machine learning model.
Edit: there are also open source weights for gpt3 type models w similar performance. Ie huggingface BLOOM or Meta OPT.
f_max t1_j3e2s3m wrote
I work at one of the big techs doing research on this. Frankly LLMs will be the leading edge of the field for the next 2 years imo. Join one of the big techs and get access to tens of thousands of dollars of compute per week to train some LLMs. Or in academia, lots of work needs to be done to characterize inference-time capabilities, understand bias, failure modes, smaller scale experiments w/ architecture, etc.
f_max t1_izhg9j7 wrote
If you have more than 1 gpu and your model is small enough to fit on 1 gpu, distributed data parallel is the go to. Basically multiple model instances training, with gradients synchronized at end of each batch. PyTorch has it integrated. And probably so does TF.
f_max t1_jbze2pl wrote
Reply to [D] Is anyone trying to just brute force intelligence with enormous model sizes and existing SOTA architectures? Are there technical limitations stopping us? by hebekec256
Speaking as someone working on scaling beyond gpt3 sizes, I think if there was proof of existence of human level ai at 100T parameters, then people would put down the money today to do it. It’s roughly $10m to train a 100B model. With rough scaling of cost with param size, it’s $10B to train this hypothetical 100T param ai. That’s the cost of buying a large tech startup. But a human level ai is probably worth more than all of big tech combined. The main thing stopping people is no one knows if the scaling curves will bend and we’ll hit a plateau in improvement with scale, so no one has the guts to put the money down.