serge_cell
serge_cell t1_jcajql2 wrote
Reply to [D] Are modern generative AI models on a path to significantly improved truthfulness? by buggaby
"Truth" only exists in the context of verification. You probaly would need some kind of RL to improve "truthfulness"
serge_cell t1_jc1tqaq wrote
Reply to comment by OptimizedGarbage in [N] Man beats machine at Go in human victory over AI : « It shows once again we’ve been far too hasty to ascribe superhuman levels of intelligence to machines. » by fchung
see previous response
serge_cell t1_jc1to7o wrote
Reply to comment by ertgbnm in [N] Man beats machine at Go in human victory over AI : « It shows once again we’ve been far too hasty to ascribe superhuman levels of intelligence to machines. » by fchung
There was a paper about it. There was a find - specific set of positions not encountered or pooply represented during self-play. Fully trained AlphaGo was failing on those positions. However then they were explicitly added to the training set the problem was fixed and AlphaGo was able to play them well. This adversarial traning seems just an automatic way to find those positions.
PS fintess landscape is not convex it separated by hills and valleys. Self-play may have a problem in reaching all important states.
serge_cell t1_jbwt0s9 wrote
Reply to comment by currentscurrents in [N] Man beats machine at Go in human victory over AI : « It shows once again we’ve been far too hasty to ascribe superhuman levels of intelligence to machines. » by fchung
It's a question of training. AlphaGo was not trained agains adversarial attacks. If it was the whole family of attacks wouldn't work, and new adversarial traning would be order of magnitude more difficult. It's a shield and sword again.
serge_cell t1_jalnarf wrote
Reply to [D] Are Genetic Algorithms Dead? by TobusFire
The notable diffrence between GA and other random searches is cross-over operator, and in it's theory "building blocks" hypothesis. Neither were confirmed during years (dozens of years) of attemted use of GA.
serge_cell t1_j7op8l0 wrote
Reply to [D] What do you think about this 16 week curriculum for existing software engineers who want to pursue AI and ML? by Imaginary-General687
In my experience many of software engineers forgot most of linear algebra and calculus if they knew them from the start. Some also forgot probailty/statistics. If there was no preliminary requirements for participants course should start from refreshing those areas.
serge_cell t1_j5sj24c wrote
Hessian-free second order will not likely work. There are reasons why everyone using gradient descent. The only working second order method seems K-FAC (disclaimer - I have no first hand experience) but as you will use Julia you will have to implement it from scratch, and it's highly non-trivial (as you can expect from method which work where other failed)
serge_cell t1_j5akgwk wrote
Yes for specific cases and mostly overly strong assumptions. It was talked about a lot several years ago and in this same subreddit too. For example:
https://arxiv.org/abs/1810.02054
https://arxiv.org/abs/1811.03804
https://arxiv.org/abs/1811.03962
https://arxiv.org/abs/1811.08888
This is recurring question, people asking it every year. Some papers should be made sticky :(
serge_cell t1_j4kv3aw wrote
> beyond just some toy experiment?
Compress models. See if you can fit 8GB model into 1G, capable to run on mobile, and at what cost.
serge_cell t1_j3r2q99 wrote
If both paper have similar results that's acually good IMO. That mean approach is actually works and not some hyperparameters fiddling.
serge_cell t1_j05qcrj wrote
DL is not working well on low-dimentional samples data, data with low correlation between sample elements, and especially bad for time series prediction which is both. Many people put that kind of senseless projects (DL for time series) on their CV and that is instant black mark for candidate, at least for me. They say "but that approach did work!" I ask "did you try anything else?" "No".
serge_cell t1_iyv2zag wrote
Reply to comment by jarekduda in [R] SGD augmented with 2nd order information from seen sequence of gradients? by jarekduda
3D Localization/Registration/Reconstruction are traditional area of use for regularized Gauss-Newton and all are highly non-convex. The trick is to strat in nearly-convex area, sometimes after several tries, and/or convexify with regularizers and/or sensors fusion.
K-FAC seems stable enough but quite complex in implementation. It's identical to low-dimentional-blocks approximation of Gauss-Newton. Fisher information is only decoration.
serge_cell t1_iycp9ri wrote
Reply to comment by r_linux_mod_isahoe in Does anyone uses Intel Arc A770 GPU for machine learning? [D] by labloke11
For that first AMD had to make normal implementation of OpenCL. People complain all the time - slowdowns, crashes, lack of portability. This going on for 10 years already and it doesn't get better.
serge_cell t1_ive6evf wrote
Reply to comment by husmen93 in [D] NVIDIA RTX 4090 vs RTX 3090 Deep Learning Benchmarks by mippie_moe
Rule of the thumb - bandwidth win.
serge_cell t1_isnc4q3 wrote
Reply to [D] Career advice: Can one make a career in building machine learning models and then selling the IP for them? by likeamanyfacedgod
No, but that doesn't mean that you wouldn't be able to monetarize it somewhat. Put it on github and link it to your CV/linkedin. You would likely get better offers form potential employers, especially if your project get some starts.
serge_cell t1_irivmoi wrote
overfitting
serge_cell t1_jcenssl wrote
Reply to [D] What do people think about OpenAI not releasing its research but benefiting from others’ research? Should google meta enforce its patents against them? by [deleted]
Let's fight the fire with gasoline.