UnusualClimberBear
UnusualClimberBear t1_jd3mklf wrote
Reply to comment by tekktokk in [R] What do we think about Meta-Interpretive Learning? by tekktokk
Even for protein folding it has been overridden by deep models. It might be useful for critical tasks where error is not allowed and everything is deterministic, but I'm not expert of the field.
UnusualClimberBear t1_jd3gqap wrote
Reply to comment by tekktokk in [R] What do we think about Meta-Interpretive Learning? by tekktokk
Usually, the problem is the combinatorial nature of the possible number of rules that could apply. Here they seem to be able to find a subset of possible rules with a polynomial complexity, but as table 7 of the second paper contains tiny 'wrt ML/RL data) instances of problems, I would answer yes to your questions. ILP is something coming with strong guarantees, while ML comes with a statistical risk. Theses guarantees aren't free.
UnusualClimberBear t1_jd2myu1 wrote
Sounds like a rebranding of Inductive Logic Programming. It does not scale, while all recent advances are about scaling simple systems. Think that for a vanilla transformer, the bottleneck is often the size of the attention because it is N^2, and people are switching to linear attention.
UnusualClimberBear t1_jczl0bn wrote
Reply to comment by currentscurrents in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
Better light a candle rather than buy an AMD GC for anything close to cutting edge.
UnusualClimberBear t1_jcfd2jt wrote
Reply to comment by SomewhereAtWork in [D] Is it possible to train LLaMa? by New_Yak1645
Yes, doable on a low budget if you have no fear of legal actions...
UnusualClimberBear t1_jceked4 wrote
Pure PR. And please do not slow down other projects.
UnusualClimberBear t1_jbnoo8n wrote
Reply to comment by potatoandleeks in [D] Is it possible to train LLaMa? by New_Yak1645
You can rent some (but not thousands) on vast.ai around $1.5 an hour
UnusualClimberBear t1_jbngux4 wrote
Reply to [D] Is it possible to train LLaMa? by New_Yak1645
Training from scratch required 2048 A100 for 21 days. And it seems only to be the final run.
I guess you can start to fine-tune it with much lower resources, 16 A100 seems reasonable as going lower will require quantization or partial loadings for the model.
UnusualClimberBear t1_j7pdue6 wrote
Reply to comment by EmbarrassedFuel in Model/paper ideas: reinforcement learning with a deterministic environment [D] by EmbarrassedFuel
This is because the information is in the books.
(free online) http://www.cds.caltech.edu/~murray/amwiki/index.php/Main_Page
https://www.amazon.com/Modern-Control-Systems-12th-Edition/dp/0136024580
Yet nonlinear breaks everything there. The usual approach is to linearize at well-chosen positions and compute the control using the closest linearization.
UnusualClimberBear t1_j7opc2r wrote
Reply to comment by UnusualClimberBear in Model/paper ideas: reinforcement learning with a deterministic environment [D] by EmbarrassedFuel
Also if your world is deterministic but you cannot build a good model of it, it may be that you are close to the situation of games such as Go, and Monte Carlo Tree search algorithms are an option to consider (variants of UCT with or without function approximation)
UnusualClimberBear t1_j7lvpz8 wrote
Reply to Model/paper ideas: reinforcement learning with a deterministic environment [D] by EmbarrassedFuel
Looks like an optimal control problem rather than an RL one. RL is there for situations with no good model available. If stochasticity is present, but you still have a good model once the uncertainty is known, then Markov predictive control is a good way to go.
UnusualClimberBear t1_j7cjxu9 wrote
Reply to comment by AdFew4357 in Are PhDs in statistics useful for ML research? [D] by AdFew4357
Basically, current trends just ignore any reasonable thing, such as train/valid/test set. For now, the bigger, the better. This requires quite a lot of tech support functions (parallelization and data pipelining in particular) rather than theory-related ones.
UnusualClimberBear t1_j7chhkq wrote
This is all about timing. Currently stats/maths capabilities are not are their best.
UnusualClimberBear t1_izaa5xr wrote
Reply to comment by Ulfgardleo in [Discussion] Suggestions on Trust Region Methods For Natural Gradient by randomkolmogorov
For RL you also need to account for the uncertainty from the states actions you almost ignored during data collection but you'd like to use more. Gradient on a policy has different behavior than a gradient on a supervised loss.
UnusualClimberBear t1_iza00fp wrote
Reply to comment by randomkolmogorov in [Discussion] Suggestions on Trust Region Methods For Natural Gradient by randomkolmogorov
TRPO is often too slow for applications because of that line search and researchers often prefer to use PPO, which also has some guarantees in terms of KL on the state distribution and is faster. I'd be curious to hear about your problem if it ends up that TRPO is the best choice.
UnusualClimberBear t1_iz9ohx8 wrote
TRPO follows the same direction as NPG with a maximal step size to still satisfy the quadratic approximation of the KL constraint. I'm not sure of what you would like to to better.
Nicolas Leroux gave a nice talk on RL seen as an optimization problem: https://slideslive.com/38935818/policy-optimization-in-reinforcement-learning-rl-as-blackbox-optimization
UnusualClimberBear t1_iyvpzci wrote
Reply to [D] Score 4.5 GNN paper from Muhan Zhang at Peking University was amazingly accepted by NeurIPS 2022 by Even_Stay3387
Because the area chair is the one making the recommendation. He managed to convince his senior area chair. Indeed you can suspect collusion, but without reading the paper, from the reviews, it looks like a typical paper with quality in the quantile 10%-60%, and at this level, acceptance is pretty random.
UnusualClimberBear t1_iylwrfc wrote
ID3 classically builds a tree by maximizing entropy gains in leaves, thus removing some irrelevant variables.
You may also be interested in energy-based models.
UnusualClimberBear t1_irzxolu wrote
It is unlikely that you are going to personally shine at Deepmind. If you think that the startup is doing the right thing and that you can have an impact on it, then it is probably a better carrer choice.
UnusualClimberBear t1_iqna2rw wrote
Reply to [Discussion] If we had enough memory to always do full batch gradient descent, would we still need rmsprop/momentum/adam? by 029187
The full gradient does not work well for NN. Plus adam has a coarse estimate of the curvature, so it would be more of a second-order method even if you can find some functions where the proposed estimates are not good.
UnusualClimberBear t1_jd9109w wrote
Reply to comment by [deleted] in [D] ICML 2023 Reviewer-Author Discussion by zy415
First, they know publication is now a big circus and that most papers are clever solutions to problems that don't exist or beautiful explanations that cannot be leveraged. Acceptance is random if your work is not in the top 2% but still in the top 60%.
Publication as proof of work is toxic