serge_cell t1_jcenssl wrote on March 16, 2023 at 8:17 AM

Reply to [D] What do people think about OpenAI not releasing its research but benefiting from others’ research? Should google meta enforce its patents against them? by [deleted]

Let's fight the fire with gasoline.

serge_cell t1_jcajql2 wrote on March 15, 2023 at 1:48 PM

Reply to [D] Are modern generative AI models on a path to significantly improved truthfulness? by buggaby

"Truth" only exists in the context of verification. You probaly would need some kind of RL to improve "truthfulness"

serge_cell t1_jc1tqaq wrote on March 13, 2023 at 12:55 PM

Reply to comment by OptimizedGarbage in [N] Man beats machine at Go in human victory over AI : « It shows once again we’ve been far too hasty to ascribe superhuman levels of intelligence to machines. » by fchung

see previous response

serge_cell t1_jc1to7o wrote on March 13, 2023 at 12:54 PM

Reply to comment by ertgbnm in [N] Man beats machine at Go in human victory over AI : « It shows once again we’ve been far too hasty to ascribe superhuman levels of intelligence to machines. » by fchung

There was a paper about it. There was a find - specific set of positions not encountered or pooply represented during self-play. Fully trained AlphaGo was failing on those positions. However then they were explicitly added to the training set the problem was fixed and AlphaGo was able to play them well. This adversarial traning seems just an automatic way to find those positions.

PS fintess landscape is not convex it separated by hills and valleys. Self-play may have a problem in reaching all important states.

serge_cell t1_jbwt0s9 wrote on March 12, 2023 at 9:17 AM

Reply to comment by currentscurrents in [N] Man beats machine at Go in human victory over AI : « It shows once again we’ve been far too hasty to ascribe superhuman levels of intelligence to machines. » by fchung

It's a question of training. AlphaGo was not trained agains adversarial attacks. If it was the whole family of attacks wouldn't work, and new adversarial traning would be order of magnitude more difficult. It's a shield and sword again.

serge_cell t1_jalnarf wrote on March 2, 2023 at 8:45 AM

Reply to [D] Are Genetic Algorithms Dead? by TobusFire

The notable diffrence between GA and other random searches is cross-over operator, and in it's theory "building blocks" hypothesis. Neither were confirmed during years (dozens of years) of attemted use of GA.

serge_cell t1_j7op8l0 wrote on February 8, 2023 at 8:54 AM

Reply to [D] What do you think about this 16 week curriculum for existing software engineers who want to pursue AI and ML? by Imaginary-General687

In my experience many of software engineers forgot most of linear algebra and calculus if they knew them from the start. Some also forgot probailty/statistics. If there was no preliminary requirements for participants course should start from refreshing those areas.

serge_cell t1_j5sj24c wrote on January 25, 2023 at 6:38 AM

Reply to [R] Easiest way to train RNN's in MATLAB or Julia? by NadaBrothers

Hessian-free second order will not likely work. There are reasons why everyone using gradient descent. The only working second order method seems K-FAC (disclaimer - I have no first hand experience) but as you will use Julia you will have to implement it from scratch, and it's highly non-trivial (as you can expect from method which work where other failed)

serge_cell t1_j5akgwk wrote on January 21, 2023 at 4:28 PM

Reply to [D] Are there any results on convergence guarantees when optimizing NNs? by Dartagnjan

Yes for specific cases and mostly overly strong assumptions. It was talked about a lot several years ago and in this same subreddit too. For example:

https://arxiv.org/abs/1810.02054

https://arxiv.org/abs/1811.03804

https://arxiv.org/abs/1811.03962

https://arxiv.org/abs/1811.08888

This is recurring question, people asking it every year. Some papers should be made sticky :(

serge_cell t1_j4kv3aw wrote on January 16, 2023 at 12:49 PM

Reply to [D] What kinds of interesting models can I train with just an RTX 4080? by faker10101891

> beyond just some toy experiment?

Compress models. See if you can fit 8GB model into 1G, capable to run on mobile, and at what cost.

serge_cell t1_j3r2q99 wrote on January 10, 2023 at 2:26 PM

Reply to [D] Found very similar paper to my submitted paper on Arxiv by [deleted]

If both paper have similar results that's acually good IMO. That mean approach is actually works and not some hyperparameters fiddling.

serge_cell t1_j05qcrj wrote on December 14, 2022 at 6:45 AM

Reply to [R] Statistical vs Deep Learning forecasting methods by fedegarzar

DL is not working well on low-dimentional samples data, data with low correlation between sample elements, and especially bad for time series prediction which is both. Many people put that kind of senseless projects (DL for time series) on their CV and that is instant black mark for candidate, at least for me. They say "but that approach did work!" I ask "did you try anything else?" "No".

serge_cell t1_iyv2zag wrote on December 4, 2022 at 11:36 AM

Reply to comment by jarekduda in [R] SGD augmented with 2nd order information from seen sequence of gradients? by jarekduda

3D Localization/Registration/Reconstruction are traditional area of use for regularized Gauss-Newton and all are highly non-convex. The trick is to strat in nearly-convex area, sometimes after several tries, and/or convexify with regularizers and/or sensors fusion.

K-FAC seems stable enough but quite complex in implementation. It's identical to low-dimentional-blocks approximation of Gauss-Newton. Fisher information is only decoration.

serge_cell t1_iycp9ri wrote on November 30, 2022 at 12:42 PM

Reply to comment by r_linux_mod_isahoe in Does anyone uses Intel Arc A770 GPU for machine learning? [D] by labloke11

For that first AMD had to make normal implementation of OpenCL. People complain all the time - slowdowns, crashes, lack of portability. This going on for 10 years already and it doesn't get better.

serge_cell t1_ive6evf wrote on November 7, 2022 at 8:48 AM

Reply to comment by husmen93 in [D] NVIDIA RTX 4090 vs RTX 3090 Deep Learning Benchmarks by mippie_moe

Rule of the thumb - bandwidth win.

serge_cell t1_isnc4q3 wrote on October 17, 2022 at 7:21 AM

Reply to [D] Career advice: Can one make a career in building machine learning models and then selling the IP for them? by likeamanyfacedgod

No, but that doesn't mean that you wouldn't be able to monetarize it somewhat. Put it on github and link it to your CV/linkedin. You would likely get better offers form potential employers, especially if your project get some starts.

serge_cell t1_irivmoi wrote on October 8, 2022 at 3:30 PM

Reply to [D] Why restrict to using a linear function to represent neurons? by MLNoober

overfitting