AlmightySnoo

AlmightySnoo t1_jecum2v wrote

I think this sub should start enforcing the explicit mention of "NOT FREE (AS IN FREEDOM)" in the title and/or flair when people use the word "open-source" when there are restrictions in place. Yes technically there's no lie, but it's still misleading (often intentionally) since many conflate open-source with free software (proof in the comments when you have people asking about it). We should be discouraging this trend of "Smile! You should be happy I'm showing you the code, but you should only use it the way I tell you to" that OpenAI started, it's a huge regression and it feels like we're back to the dark days before the GPL.

88

AlmightySnoo OP t1_j9c56bx wrote

>It’s not even using the generative model for anything useful.

Thank you, that's literally what I meant in my second paragraph. They're literally training the GAN to learn Dirac distributions. The noise has no use, and the discriminator eventually ends up learning to do roughly the job of a simple squared loss.

−6

AlmightySnoo OP t1_j9c2trd wrote

>It doesn’t seem like plagiarism, since they do ample citation.

It is when you are pretending to do things differently while in practice you do the exact same thing and add a useless layer (the GAN) to give the false impression of novelty. Merely citing sources in such cases doesn't shield you from being accused of plagiarism.

>As far as the justification goes, there are some generative based approaches for solving parametric PDEs even now.

Not disputing that there might be papers out there where the use is justified, of course there are skilled researchers with academic integrity. But again, in this paper, and the ones I'm talking about in general, the setting is exactly as in my 2nd paragraph, where the use of GANs is clearly not justified at all.

>but I don’t think it’s that bad

Again, in the context of my second paragraph (because that's literally what they're doing), it is bad.

−17

AlmightySnoo t1_j3fa93g wrote

Excerpt from pages 8 and 9:

>Unfortunately, the formal infinite-width limit, n -> ∞, leads to a poor model of deep neural networks: not only is infinite width an unphysical property for a network to possess, but the resulting trained distribution also leads to a mismatch between theoretical description and practical observation for networks of more than one layer. In particular, it’s empirically known that the distribution over such trained networks does depend on the properties of the learning algorithm used to train them. Additionally, we will show in detail that such infinite-width networks cannot learn representations of their inputs: for any input x, its transformations in the hidden layers will remain unchanged from initialization, leading to random representations and thus severely restricting the class of functions that such networks are capable of learning. Since nontrivial representation learning is an empirically demonstrated essential property of multilayer networks, this really underscores the breakdown of the correspondence between theory and reality in this strict infinite-width limit.
>
>From the theoretical perspective, the problem with this limit is the washing out
of the fine details at each neuron due to the consideration of an infinite number of incoming signals. In particular, such an infinite accumulation completely eliminates the subtle correlations between neurons that get amplified over the course of training for representation learning.

30

AlmightySnoo t1_j32s8ve wrote

Also this. $AMD still makes it explicit that they officially support Rocm only on CDNA GPUs, and even then it's only under Linux. That's an immediate turn off for lots of beginner GPGPU programmers who'll immediately flock to CUDA as it works with any not too old gaming GPU from Nvidia. It's astonishing how Lisa Su still hasn't realized the gravity of this blunder.

36

AlmightySnoo t1_iywjf4u wrote

Mistakes like these can happen for a variety of reasons (bug, typo in the code, forgot to disable some flag that you were using for dirty and fast results during your trials, etc...) and it's actually a good thing they rectified the results.

Why do you always have to assume malicious intent and rush to Reddit with a throwaway account to shame the authors? smh

23

AlmightySnoo t1_is5twx8 wrote

You're memory-bound on neural network problems as frameworks usually perform multiple load/stores from/in the GPU's global RAM at each activation/layer. Operator fusion as done for example by PyTorch's JIT compiler helps a bit but it cannot fuse operators with a matrix multiplication since the latter is usually done using cuBLAS. NN frameworks need to rethink this "okay efficient matrix multiplication algos aren't trivial so let's delegate this to a blackbox code like cublas" mentality as I think it's a shameful waste of chip power and caps the potential of GPUs.

17