UnusualClimberBear

UnusualClimberBear t1_jd9109w wrote

First, they know publication is now a big circus and that most papers are clever solutions to problems that don't exist or beautiful explanations that cannot be leveraged. Acceptance is random if your work is not in the top 2% but still in the top 60%.

Publication as proof of work is toxic

1

UnusualClimberBear t1_jd3gqap wrote

Usually, the problem is the combinatorial nature of the possible number of rules that could apply. Here they seem to be able to find a subset of possible rules with a polynomial complexity, but as table 7 of the second paper contains tiny 'wrt ML/RL data) instances of problems, I would answer yes to your questions. ILP is something coming with strong guarantees, while ML comes with a statistical risk. Theses guarantees aren't free.

1

UnusualClimberBear t1_jbngux4 wrote

Training from scratch required 2048 A100 for 21 days. And it seems only to be the final run.

I guess you can start to fine-tune it with much lower resources, 16 A100 seems reasonable as going lower will require quantization or partial loadings for the model.

7

UnusualClimberBear t1_j7pdue6 wrote

This is because the information is in the books.

(free online) http://www.cds.caltech.edu/~murray/amwiki/index.php/Main_Page

https://www.amazon.com/Modern-Control-Systems-12th-Edition/dp/0136024580

Yet nonlinear breaks everything there. The usual approach is to linearize at well-chosen positions and compute the control using the closest linearization.

2

UnusualClimberBear t1_iz9ohx8 wrote

TRPO follows the same direction as NPG with a maximal step size to still satisfy the quadratic approximation of the KL constraint. I'm not sure of what you would like to to better.

Nicolas Leroux gave a nice talk on RL seen as an optimization problem: https://slideslive.com/38935818/policy-optimization-in-reinforcement-learning-rl-as-blackbox-optimization

3

UnusualClimberBear t1_iyvpzci wrote

Because the area chair is the one making the recommendation. He managed to convince his senior area chair. Indeed you can suspect collusion, but without reading the paper, from the reviews, it looks like a typical paper with quality in the quantile 10%-60%, and at this level, acceptance is pretty random.

44