mtocrat
mtocrat t1_j6zin88 wrote
Reply to comment by koolaidman123 in [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta
supervised fine-tuning seems inherently limited here. You regress to the best in the set of answers but that's it. RLHF can improve beyond that, up to the point where the generalization capabilities of the reward model fail..
mtocrat t1_j4zecpm wrote
Reply to comment by dataslacker in [R] A simple explanation of Reinforcement Learning from Human Feedback (RLHF) by JClub
What you're describing is a general approach to RL that is used in different forms in many methods: sample actions, weight or rank them in some way by the estimated return, regress to the weighted actions. So you're not suggesting to do something other than RL but to replace one RL approach with a different RL approach.
mtocrat t1_j2rxj5k wrote
Reply to comment by notyourregularnerd in [D] life advice to relatively late bloomer ML theory researcher. by notyourregularnerd
Fwiw, Germany has a portion of people who stay enrolled forever because it doesn't cost anything and they may have a somewhat decent job on the side that funds them. That's not the kind of person who pursues a PhD, so I wouldn't put too much stock in averages here.
mtocrat t1_j2rx2aq wrote
Reply to comment by ButchOfBlaviken in [D] life advice to relatively late bloomer ML theory researcher. by notyourregularnerd
If I remember correctly, the total time of Bachelor + Masters is supposed to be 5 years, but the split can vary. 3+2 seems typical
mtocrat t1_j2rwnzt wrote
Reply to comment by TaXxER in [D] life advice to relatively late bloomer ML theory researcher. by notyourregularnerd
In my program, in the US, the majority had a masters degree and 27 would be a normal age to start, but there is a lot of variance.
mtocrat t1_j17ybgf wrote
Reply to [D] Hype around LLMs by Ayicikio
It's seen as something big because what we observe in the responses implies a level of reasoning that is much greater than what we expected. It seems that operating in the language domain allows you to make use of that in a way that currently can't be done with other methods. The older systems you describe cannot do this and were therefore less interesting, although it's worth noting that plenty of current users prefer to operate their devices with text based inputs using Siri, Alexa or Google Assistant.
mtocrat t1_j14di0f wrote
Reply to comment by blose1 in [R] Nonparametric Masked Language Modeling - MetaAi 2022 - NPM - 500x fewer parameters than GPT-3 while outperforming it on zero-shot tasks by Singularian2501
How is that relevant?
mtocrat t1_j0u7gh2 wrote
Reply to comment by tripple13 in [D] Will there be a replacement for Machine Learning Twitter? by MrAcurite
probably downvoted by people who don't consider themselves to be activists but have an issue with Musks handling of twitter.
mtocrat t1_iymi8i7 wrote
Reply to comment by TrueBirch in [R] Statistical vs Deep Learning forecasting methods by fedegarzar
You could already tape together a deep learning solution consisting of neural speech recognition, an LLM and Wavenet. Counts as a deep learning solution in my book. I'm not sure if anyone has built an end-to-end solution and I expect it would be worse, but I'm sure if someone put their mind and money to it you'd get decent results
mtocrat t1_iyk1se1 wrote
Reply to comment by SrPinko in [R] Statistical vs Deep Learning forecasting methods by fedegarzar
Even for univariate time series, when you have the data & complexity, DL will obviously outperform simple methods. Show me the simple statistical method that can generate speech, a univariate time-series.
mtocrat t1_iyk1n65 wrote
Reply to comment by bushrod in [R] Statistical vs Deep Learning forecasting methods by fedegarzar
Consider spoken language, and you're back in the realm of time-series. Obviously simple statistical methods can't deal with those though.
mtocrat t1_iycqxw4 wrote
Reply to comment by pm_me_your_pay_slips in [D] I'm at NeurIPS, AMA by ThisIsMyStonerAcount
companies throwing money around like there's no tomorrow is a high point for people who like money and are working in the industry or aspiring to.
mtocrat t1_j6zk1ka wrote
Reply to [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta
Let's say your initial model is quite racist and outputs only extremely or moderately racist choices. If you rank those against each other and do supervised training on that dataset you train it to mimic the moderately racist style. You might however plausibly train a model from this that can judge what racism is and extrapolate to judge answers free of it to be even better. Then you optimize with respect to that model to get that style