serge_cell

serge_cell t1_jc1to7o wrote

There was a paper about it. There was a find - specific set of positions not encountered or pooply represented during self-play. Fully trained AlphaGo was failing on those positions. However then they were explicitly added to the training set the problem was fixed and AlphaGo was able to play them well. This adversarial traning seems just an automatic way to find those positions.

PS fintess landscape is not convex it separated by hills and valleys. Self-play may have a problem in reaching all important states.

1

serge_cell t1_jbwt0s9 wrote

It's a question of training. AlphaGo was not trained agains adversarial attacks. If it was the whole family of attacks wouldn't work, and new adversarial traning would be order of magnitude more difficult. It's a shield and sword again.

6

serge_cell t1_jalnarf wrote

The notable diffrence between GA and other random searches is cross-over operator, and in it's theory "building blocks" hypothesis. Neither were confirmed during years (dozens of years) of attemted use of GA.

3

serge_cell t1_j5sj24c wrote

Hessian-free second order will not likely work. There are reasons why everyone using gradient descent. The only working second order method seems K-FAC (disclaimer - I have no first hand experience) but as you will use Julia you will have to implement it from scratch, and it's highly non-trivial (as you can expect from method which work where other failed)

3

serge_cell t1_j5akgwk wrote

Yes for specific cases and mostly overly strong assumptions. It was talked about a lot several years ago and in this same subreddit too. For example:

https://arxiv.org/abs/1810.02054

https://arxiv.org/abs/1811.03804

https://arxiv.org/abs/1811.03962

https://arxiv.org/abs/1811.08888

This is recurring question, people asking it every year. Some papers should be made sticky :(

1

serge_cell t1_j05qcrj wrote

DL is not working well on low-dimentional samples data, data with low correlation between sample elements, and especially bad for time series prediction which is both. Many people put that kind of senseless projects (DL for time series) on their CV and that is instant black mark for candidate, at least for me. They say "but that approach did work!" I ask "did you try anything else?" "No".

1

serge_cell t1_iyv2zag wrote

3D Localization/Registration/Reconstruction are traditional area of use for regularized Gauss-Newton and all are highly non-convex. The trick is to strat in nearly-convex area, sometimes after several tries, and/or convexify with regularizers and/or sensors fusion.

K-FAC seems stable enough but quite complex in implementation. It's identical to low-dimentional-blocks approximation of Gauss-Newton. Fisher information is only decoration.

1