mtocrat t1_j6zk1ka wrote on February 3, 2023 at 12:14 AM

Reply to [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta

Let's say your initial model is quite racist and outputs only extremely or moderately racist choices. If you rank those against each other and do supervised training on that dataset you train it to mimic the moderately racist style. You might however plausibly train a model from this that can judge what racism is and extrapolate to judge answers free of it to be even better. Then you optimize with respect to that model to get that style

mtocrat t1_j6zin88 wrote on February 3, 2023 at 12:04 AM

Reply to comment by koolaidman123 in [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta

supervised fine-tuning seems inherently limited here. You regress to the best in the set of answers but that's it. RLHF can improve beyond that, up to the point where the generalization capabilities of the reward model fail..

mtocrat t1_j4zecpm wrote on January 19, 2023 at 9:10 AM

Reply to comment by dataslacker in [R] A simple explanation of Reinforcement Learning from Human Feedback (RLHF) by JClub

What you're describing is a general approach to RL that is used in different forms in many methods: sample actions, weight or rank them in some way by the estimated return, regress to the weighted actions. So you're not suggesting to do something other than RL but to replace one RL approach with a different RL approach.

mtocrat t1_j2rxj5k wrote on January 3, 2023 at 3:13 PM

Reply to comment by notyourregularnerd in [D] life advice to relatively late bloomer ML theory researcher. by notyourregularnerd

Fwiw, Germany has a portion of people who stay enrolled forever because it doesn't cost anything and they may have a somewhat decent job on the side that funds them. That's not the kind of person who pursues a PhD, so I wouldn't put too much stock in averages here.

mtocrat t1_j2rx2aq wrote on January 3, 2023 at 3:10 PM

Reply to comment by ButchOfBlaviken in [D] life advice to relatively late bloomer ML theory researcher. by notyourregularnerd

If I remember correctly, the total time of Bachelor + Masters is supposed to be 5 years, but the split can vary. 3+2 seems typical

mtocrat t1_j2rwnzt wrote on January 3, 2023 at 3:07 PM

Reply to comment by TaXxER in [D] life advice to relatively late bloomer ML theory researcher. by notyourregularnerd

In my program, in the US, the majority had a masters degree and 27 would be a normal age to start, but there is a lot of variance.

mtocrat t1_j17ybgf wrote on December 22, 2022 at 9:21 AM

Reply to [D] Hype around LLMs by Ayicikio

It's seen as something big because what we observe in the responses implies a level of reasoning that is much greater than what we expected. It seems that operating in the language domain allows you to make use of that in a way that currently can't be done with other methods. The older systems you describe cannot do this and were therefore less interesting, although it's worth noting that plenty of current users prefer to operate their devices with text based inputs using Siri, Alexa or Google Assistant.

mtocrat t1_j14di0f wrote on December 21, 2022 at 4:11 PM

Reply to comment by blose1 in [R] Nonparametric Masked Language Modeling - MetaAi 2022 - NPM - 500x fewer parameters than GPT-3 while outperforming it on zero-shot tasks by Singularian2501

How is that relevant?

mtocrat t1_j0u7gh2 wrote on December 19, 2022 at 1:29 PM

Reply to comment by tripple13 in [D] Will there be a replacement for Machine Learning Twitter? by MrAcurite

probably downvoted by people who don't consider themselves to be activists but have an issue with Musks handling of twitter.

mtocrat t1_iymi8i7 wrote on December 2, 2022 at 2:33 PM

Reply to comment by TrueBirch in [R] Statistical vs Deep Learning forecasting methods by fedegarzar

You could already tape together a deep learning solution consisting of neural speech recognition, an LLM and Wavenet. Counts as a deep learning solution in my book. I'm not sure if anyone has built an end-to-end solution and I expect it would be worse, but I'm sure if someone put their mind and money to it you'd get decent results

mtocrat t1_iyk1se1 wrote on December 1, 2022 at 11:40 PM

Reply to comment by SrPinko in [R] Statistical vs Deep Learning forecasting methods by fedegarzar

Even for univariate time series, when you have the data & complexity, DL will obviously outperform simple methods. Show me the simple statistical method that can generate speech, a univariate time-series.

mtocrat t1_iyk1n65 wrote on December 1, 2022 at 11:39 PM

Reply to comment by bushrod in [R] Statistical vs Deep Learning forecasting methods by fedegarzar

Consider spoken language, and you're back in the realm of time-series. Obviously simple statistical methods can't deal with those though.

mtocrat t1_iycqxw4 wrote on November 30, 2022 at 12:58 PM

Reply to comment by pm_me_your_pay_slips in [D] I'm at NeurIPS, AMA by ThisIsMyStonerAcount

companies throwing money around like there's no tomorrow is a high point for people who like money and are working in the industry or aspiring to.